AI Benchmarks
Compare leading AI models across standardized benchmarks. Last updated 2026-03-29.
MMLU-Pro — General knowledge and reasoning across 57 subjects. Max score: 100.
| Rank | Model | Provider | Score↓ | Released |
|---|---|---|---|---|
| #1 | Claude Opus 4.6 | Anthropic | 92.4/ 100 | 2026-03 |
| #2 | o1 | OpenAI | 91.8/ 100 | 2025-09 |
| #3 | Gemini 2.5 Pro | 91.2/ 100 | 2026-01 | |
| #4 | GPT-4.5 | OpenAI | 90.1/ 100 | 2025-12 |
| #5 | Llama 4 Maverick | Meta | 89.3/ 100 | 2026-03 |
| #6 | Claude Sonnet 4.6 | Anthropic | 88.7/ 100 | 2026-02 |
| #7 | DeepSeek V3 | DeepSeek | 88.1/ 100 | 2025-12 |
| #8 | GPT-4o | OpenAI | 87.2/ 100 | 2025-05 |
| #9 | Mistral Large | Mistral | 86.8/ 100 | 2025-11 |
| #10 | o3-mini | OpenAI | 86.3/ 100 | 2025-11 |
| #11 | Llama 4 Scout | Meta | 85.9/ 100 | 2026-02 |
| #12 | Gemini 2.0 Flash | 84.5/ 100 | 2025-10 | |
| #13 | Claude Haiku 4.5 | Anthropic | 82.1/ 100 | 2026-01 |
| #14 | Mistral Small | Mistral | 78.4/ 100 | 2025-09 |