AI Research
Latest papers, benchmarks, and research developments
Latest Research
Benchmark Tracker
| Benchmark | Claude Opus 4.6 | GPT-4.5 | Gemini 2.5 Pro | Llama 4 |
|---|---|---|---|---|
| MMLU | 92.4 | 90.8 | 91.1 | 86.3 |
| HumanEval | 95.1 | 93.7 | 94.2 | 88.9 |
| GPQA | 74.6 | 71.2 | 72.8 | 63.5 |
Scores represent published results as of March 2026. Higher is better.