一些参数:
AIME 24 | AIME 25 | OmniMath | GPQA-D | LiveCodeBench (8/1/24–2/1/25) | |
---|---|---|---|---|---|
Phi-4-reasoning | 75.3 | 62.9 | 76.6 | 65.8 | 53.8 |
Phi-4-reasoning-plus | 81.3 | 78.0 | 81.9 | 68.9 | 53.1 |
OpenThinker2-32B | 58.0 | 58.0 | — | 64.1 | — |
QwQ 32B | 79.5 | 65.8 | — | 59.5 | 63.4 |
EXAONE-Deep-32B | 72.1 | 65.8 | — | 66.1 | 59.5 |
DeepSeek-R1-Distill-70B | 69.3 | 51.5 | 63.4 | 66.2 | 57.5 |
DeepSeek-R1 | 78.7 | 70.4 | 85.0 | 73.0 | 62.8 |
o1-mini | 63.6 | 54.8 | — | 60.0 | 53.8 |
o1 | 74.6 | 75.3 | 67.5 | 76.7 | 71.0 |
o3-mini | 88.0 | 78.0 | 74.6 | 77.7 | 69.5 |
Claude-3.7-Sonnet | 55.3 | 58.7 | 54.6 | 76.8 | — |
Gemini-2.5-Pro | 92.0 | 86.7 | 61.1 | 84.0 | 69.2 |