Benchmarks
FreeGET /api/benchmarksThe /api/benchmarks endpoint returns benchmark scores for major AI models across SWE-bench (real software engineering tasks), MMLU-Pro (general reasoning), HumanEval (code generation), GPQA Diamond (graduate science), and MATH (competition math). Updated weekly as new scores publish.
When to use this endpoint
When your agent needs to compare model capability on a specific dimension. For per-benchmark leaderboard views see /benchmarks/[name]; for time-series of one model on one benchmark, use /api/premium/history/benchmarks/series.
Example response
{
"ok": true,
"lastUpdated": "2026-04-24",
"benchmarks": [
{ "id": "swe_bench", "name": "SWE-bench", "description": "Real GitHub issue resolution", "maxScore": 100 }
],
"models": [
{
"model": "Claude Opus 4.7",
"provider": "Anthropic",
"scores": { "swe_bench": 65.4, "mmlu_pro": 93.8, "human_eval": 96.2 }
}
]
}Code samples
Python SDK
from tensorfeed import TensorFeed
tf = TensorFeed()
b = tf.benchmarks()
# Sort models by SWE-bench desc
ranked = sorted(b["models"], key=lambda m: m["scores"].get("swe_bench", 0), reverse=True)TypeScript SDK
import { TensorFeed } from 'tensorfeed';
const tf = new TensorFeed();
const { models } = await tf.benchmarks();
const top = models
.filter(m => m.scores.swe_bench)
.sort((a, b) => b.scores.swe_bench - a.scores.swe_bench);FAQ
Where do the benchmark scores come from?
Published scores from each benchmark's official leaderboard plus, where applicable, vendor-published numbers verified against the test methodology. We do not run independent benchmark evaluations.
How current are the benchmark scores?
Updated weekly via the daily catalog cron. New model launches typically land within a few days of public score publication.