LIVE
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
All endpoints

Benchmarks

Free
GET /api/benchmarks

The /api/benchmarks endpoint returns benchmark scores for major AI models across SWE-bench (real software engineering tasks), MMLU-Pro (general reasoning), HumanEval (code generation), GPQA Diamond (graduate science), and MATH (competition math). Updated weekly as new scores publish.

When to use this endpoint

When your agent needs to compare model capability on a specific dimension. For per-benchmark leaderboard views see /benchmarks/[name]; for time-series of one model on one benchmark, use /api/premium/history/benchmarks/series.

Example response

{
  "ok": true,
  "lastUpdated": "2026-04-24",
  "benchmarks": [
    { "id": "swe_bench", "name": "SWE-bench", "description": "Real GitHub issue resolution", "maxScore": 100 }
  ],
  "models": [
    {
      "model": "Claude Opus 4.7",
      "provider": "Anthropic",
      "scores": { "swe_bench": 65.4, "mmlu_pro": 93.8, "human_eval": 96.2 }
    }
  ]
}

Code samples

Python SDK

from tensorfeed import TensorFeed

tf = TensorFeed()
b = tf.benchmarks()
# Sort models by SWE-bench desc
ranked = sorted(b["models"], key=lambda m: m["scores"].get("swe_bench", 0), reverse=True)

TypeScript SDK

import { TensorFeed } from 'tensorfeed';

const tf = new TensorFeed();
const { models } = await tf.benchmarks();
const top = models
  .filter(m => m.scores.swe_bench)
  .sort((a, b) => b.scores.swe_bench - a.scores.swe_bench);

FAQ

Where do the benchmark scores come from?

Published scores from each benchmark's official leaderboard plus, where applicable, vendor-published numbers verified against the test methodology. We do not run independent benchmark evaluations.

How current are the benchmark scores?

Updated weekly via the daily catalog cron. New model launches typically land within a few days of public score publication.

Related endpoints