LIVE
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
All endpoints

Inference Provider Matrix

Free
GET /api/inference-providers

The /api/inference-providers endpoint returns the cross-provider pricing matrix for open-weight models. Same Llama 4 Maverick / Scout / DeepSeek V4 / Mixtral / Qwen 2.5 weights, different price across 8 hosted providers. Each offer carries input price, output price, blended price, output TPS, context window the provider serves at, feature flags (function calling, json mode, vision), and the provider docs URL.

When to use this endpoint

When your agent is picking the cheapest hosted inference path for an open-weight model. For a single-model lookup use /api/inference-providers/cheapest instead so you do not need the full matrix.

Parameters

NameInTypeDescription
familyquerystringFilter by origin lab (Meta, DeepSeek, Mistral, Alibaba)e.g. Meta

* required

Example response

{
  "ok": true,
  "lastUpdated": "2026-04-30",
  "tracked_providers": ["Together AI", "Fireworks", "DeepInfra", "Groq", "OpenRouter", "Replicate", "Anyscale", "DeepSeek"],
  "models": [
    {
      "modelId": "llama-4-scout",
      "modelName": "Llama 4 Scout",
      "family": "Meta",
      "paramsB": 109,
      "license": "Llama 4 Community License",
      "openWeights": true,
      "offers": [
        { "provider": "DeepInfra", "inputPrice": 0.16, "outputPrice": 0.55, "blendedPrice": 0.355, "contextWindow": 10000000, "outputTPS": 170, "features": ["function-calling", "vision"] }
      ]
    }
  ]
}

Code samples

Python SDK

from tensorfeed import TensorFeed
tf = TensorFeed()
matrix = tf.inference_providers(family="Meta")
for m in matrix["models"]:
    cheapest = min(m["offers"], key=lambda o: o["blendedPrice"])
    print(f"{m['modelName']:<28} {cheapest['provider']:<14} ${cheapest['blendedPrice']:.3f}")

TypeScript SDK

const res = await fetch("https://tensorfeed.ai/api/inference-providers?family=Meta");
const { models } = await res.json();
for (const m of models) {
  const cheapest = m.offers.reduce((a, b) => a.blendedPrice < b.blendedPrice ? a : b);
  console.log(`${m.modelName}: ${cheapest.provider} @ $${cheapest.blendedPrice}`);
}

FAQ

Why is the same model priced differently across providers?

Each inference provider runs its own GPU fleet, quantization strategy, and batching policy. Together and Fireworks anchor on FP8 Turbo variants for speed. DeepInfra optimizes for raw cost. Groq runs custom LPU silicon at very high throughput with a context-window trade-off. The price spread on a single model is routinely 3-10x.

How fresh is this data?

Editorial weekly refresh. Provider pricing changes more often than embedding pricing but less often than spot-priced compute, so a weekly cadence is the right granularity.

Related endpoints