LIVE
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
All endpoints

Inference Cheapest

Free
GET /api/inference-providers/cheapest

Agent-friendly entry point into the inference provider matrix. Pass a canonical model id and get back the top 3 cheapest offers (default sort is blended price). Skips the full matrix payload, which is useful when an agent is just picking the cheapest path and does not need every column.

When to use this endpoint

When your agent needs the cheapest inference path for a specific open-weight model in a single call. For a different sort, pass ?sort=input|output|tps_desc.

Parameters

NameInTypeDescription
model*querystringCanonical model id (llama-4-maverick, llama-4-scout, llama-3.1-70b, llama-3.1-405b, deepseek-v4-pro, deepseek-v4-flash, mixtral-8x22b, qwen-2.5-72b)e.g. llama-4-scout
sortquerystringSort order: blended (default), input, output, tps_desce.g. tps_desc

* required

Example response

{
  "ok": true,
  "modelId": "llama-4-scout",
  "modelName": "Llama 4 Scout",
  "family": "Meta",
  "sortBy": "blended",
  "cheapest": { "provider": "DeepInfra", "blendedPrice": 0.355, "inputPrice": 0.16, "outputPrice": 0.55, "contextWindow": 10000000, "outputTPS": 170 },
  "top3": [
    { "provider": "DeepInfra", "blendedPrice": 0.355 },
    { "provider": "OpenRouter", "blendedPrice": 0.385 },
    { "provider": "Groq", "blendedPrice": 0.385 }
  ]
}

Code samples

Python SDK

from tensorfeed import TensorFeed
tf = TensorFeed()
result = tf.inference_cheapest("llama-4-scout")
print(f"Cheapest: {result['cheapest']['provider']} at ${result['cheapest']['blendedPrice']:.3f}/1M blended")

TypeScript SDK

const res = await fetch("https://tensorfeed.ai/api/inference-providers/cheapest?model=llama-4-scout");
const result = await res.json();
console.log(`Cheapest: ${result.cheapest.provider} @ $${result.cheapest.blendedPrice}`);

FAQ

What if my model is not in the matrix?

Returns 404 model_not_found. List available models at /api/inference-providers. We track the most-served open-weight models; if you need one we do not cover, the catalog is editorial and we add new models on demand.

Why is the sort default blended and not input?

Because real workloads have non-zero output usage. Blended at 1:1 input:output ratio is a better proxy for actual cost than input-only. If your workload is heavy-input or heavy-output, pass ?sort=input or ?sort=output.

Related endpoints