Benchmark Series
1 creditGET /api/premium/history/benchmarks/seriesThe benchmark series endpoint returns the daily score evolution for a single benchmark on a single model. Useful for tracking whether a model is improving (provider released a fine-tune), regressing (provider downgraded the API endpoint to a smaller model), or holding steady.
When to use this endpoint
When a research agent needs to track a benchmark trajectory. For a snapshot leaderboard at a single date, see /benchmarks/[name].
Parameters
| Name | In | Type | Description |
|---|---|---|---|
| model* | query | string | Model id or display name |
| benchmark* | query | string | Benchmark key (swe_bench, mmlu_pro, gpqa_diamond, math, human_eval) |
| from | query | string | Start date YYYY-MM-DD UTC |
| to | query | string | End date YYYY-MM-DD UTC |
* required
Example response
{
"ok": true,
"model": "Claude Opus 4.7",
"benchmark": "swe_bench",
"points": [
{ "date": "2026-04-01", "score": 70.0 },
{ "date": "2026-04-27", "score": 73.4 }
],
"summary": { "first": { "score": 70.0 }, "latest": { "score": 73.4 }, "delta_pp": 3.4 }
}Code samples
Python SDK
from tensorfeed import TensorFeed
tf = TensorFeed(token="tf_live_...")
s = tf.benchmark_series(model="Claude Opus 4.7", benchmark="swe_bench")
print(f"SWE-bench moved {s['summary']['delta_pp']} pp")TypeScript SDK
import { TensorFeed } from 'tensorfeed';
const tf = new TensorFeed({ token: 'tf_live_...' });
const s = await tf.benchmarkSeries({ model: 'Claude Opus 4.7', benchmark: 'swe_bench' });MCP tool
Available via the TensorFeed MCP server as benchmark_series. Add npx -y @tensorfeed/mcp-server to your Claude Desktop or Claude Code MCP config.
FAQ
Why would a benchmark score change over time on the same model?
Three common reasons: provider released a fine-tune or new system prompt and updated the score, the test methodology changed (e.g. SWE-bench Verified subset got new tasks), or the score was recalculated against a different harness. Tracking the trajectory surfaces these changes.