AI Research

Latest papers, benchmarks, and research developments

Latest Research

Benchmark Tracker

BenchmarkClaude Opus 4.6GPT-4.5Gemini 2.5 ProLlama 4
MMLU92.490.891.186.3
HumanEval95.193.794.288.9
GPQA74.671.272.863.5

Scores represent published results as of March 2026. Higher is better.