Inference Provider Matrix

Free

GET /api/inference-providers

The /api/inference-providers endpoint returns the cross-provider pricing matrix for open-weight models. Same Llama 4 Maverick / Scout / DeepSeek V4 / Mixtral / Qwen 2.5 weights, different price across 8 hosted providers. Each offer carries input price, output price, blended price, output TPS, context window the provider serves at, feature flags (function calling, json mode, vision), and the provider docs URL.

When to use this endpoint

When your agent is picking the cheapest hosted inference path for an open-weight model. For a single-model lookup use /api/inference-providers/cheapest instead so you do not need the full matrix.

Parameters

Name	In	Type	Description
family	query	string	Filter by origin lab (Meta, DeepSeek, Mistral, Alibaba)`e.g. Meta`

* required

Example response

{
  "ok": true,
  "lastUpdated": "2026-04-30",
  "tracked_providers": ["Together AI", "Fireworks", "DeepInfra", "Groq", "OpenRouter", "Replicate", "Anyscale", "DeepSeek"],
  "models": [
    {
      "modelId": "llama-4-scout",
      "modelName": "Llama 4 Scout",
      "family": "Meta",
      "paramsB": 109,
      "license": "Llama 4 Community License",
      "openWeights": true,
      "offers": [
        { "provider": "DeepInfra", "inputPrice": 0.16, "outputPrice": 0.55, "blendedPrice": 0.355, "contextWindow": 10000000, "outputTPS": 170, "features": ["function-calling", "vision"] }
      ]
    }
  ]
}

Code samples

Python SDK

from tensorfeed import TensorFeed
tf = TensorFeed()
matrix = tf.inference_providers(family="Meta")
for m in matrix["models"]:
    cheapest = min(m["offers"], key=lambda o: o["blendedPrice"])
    print(f"{m['modelName']:<28} {cheapest['provider']:<14} ${cheapest['blendedPrice']:.3f}")

TypeScript SDK

const res = await fetch("https://tensorfeed.ai/api/inference-providers?family=Meta");
const { models } = await res.json();
for (const m of models) {
  const cheapest = m.offers.reduce((a, b) => a.blendedPrice < b.blendedPrice ? a : b);
  console.log(`${m.modelName}: ${cheapest.provider} @ $${cheapest.blendedPrice}`);
}

FAQ

Why is the same model priced differently across providers?

Each inference provider runs its own GPU fleet, quantization strategy, and batching policy. Together and Fireworks anchor on FP8 Turbo variants for speed. DeepInfra optimizes for raw cost. Groq runs custom LPU silicon at very high throughput with a context-window trade-off. The price spread on a single model is routinely 3-10x.

How fresh is this data?

Editorial weekly refresh. Provider pricing changes more often than embedding pricing but less often than spot-priced compute, so a weekly cadence is the right granularity.

Inference Provider Matrix

When to use this endpoint

Parameters

Example response

Code samples

Python SDK

TypeScript SDK

FAQ

Why is the same model priced differently across providers?

How fresh is this data?

Related endpoints

Inference Cheapest

Embedding Models

Models