Benchmark Series

1 credit

GET /api/premium/history/benchmarks/series

The benchmark series endpoint returns the daily score evolution for a single benchmark on a single model. Useful for tracking whether a model is improving (provider released a fine-tune), regressing (provider downgraded the API endpoint to a smaller model), or holding steady.

When to use this endpoint

When a research agent needs to track a benchmark trajectory. For a snapshot leaderboard at a single date, see /benchmarks/[name].

Parameters

Name	In	Type	Description
model*	query	string	Model id or display name
benchmark*	query	string	Benchmark key (swe_bench, mmlu_pro, gpqa_diamond, math, human_eval)
from	query	string	Start date YYYY-MM-DD UTC
to	query	string	End date YYYY-MM-DD UTC

* required

Example response

{
  "ok": true,
  "model": "Claude Opus 4.7",
  "benchmark": "swe_bench",
  "points": [
    { "date": "2026-04-01", "score": 70.0 },
    { "date": "2026-04-27", "score": 73.4 }
  ],
  "summary": { "first": { "score": 70.0 }, "latest": { "score": 73.4 }, "delta_pp": 3.4 }
}

Code samples

Python SDK

from tensorfeed import TensorFeed

tf = TensorFeed(token="tf_live_...")
s = tf.benchmark_series(model="Claude Opus 4.7", benchmark="swe_bench")
print(f"SWE-bench moved {s['summary']['delta_pp']} pp")

TypeScript SDK

import { TensorFeed } from 'tensorfeed';

const tf = new TensorFeed({ token: 'tf_live_...' });
const s = await tf.benchmarkSeries({ model: 'Claude Opus 4.7', benchmark: 'swe_bench' });

MCP tool

Available via the TensorFeed MCP server as benchmark_series. Add npx -y @tensorfeed/mcp-server to your Claude Desktop or Claude Code MCP config.

FAQ

Why would a benchmark score change over time on the same model?

Three common reasons: provider released a fine-tune or new system prompt and updated the score, the test methodology changed (e.g. SWE-bench Verified subset got new tasks), or the score was recalculated against a different harness. Tracking the trajectory surfaces these changes.

Benchmark Series

When to use this endpoint

Parameters

Example response

Code samples

Python SDK

TypeScript SDK

MCP tool

FAQ

Why would a benchmark score change over time on the same model?

Related endpoints

Benchmarks

Forecast

Snapshot Diff