Inference Provider Matrix
FreeGET /api/inference-providersThe /api/inference-providers endpoint returns the cross-provider pricing matrix for open-weight models. Same Llama 4 Maverick / Scout / DeepSeek V4 / Mixtral / Qwen 2.5 weights, different price across 8 hosted providers. Each offer carries input price, output price, blended price, output TPS, context window the provider serves at, feature flags (function calling, json mode, vision), and the provider docs URL.
When to use this endpoint
When your agent is picking the cheapest hosted inference path for an open-weight model. For a single-model lookup use /api/inference-providers/cheapest instead so you do not need the full matrix.
Parameters
| Name | In | Type | Description |
|---|---|---|---|
| family | query | string | Filter by origin lab (Meta, DeepSeek, Mistral, Alibaba)e.g. Meta |
* required
Example response
{
"ok": true,
"lastUpdated": "2026-04-30",
"tracked_providers": ["Together AI", "Fireworks", "DeepInfra", "Groq", "OpenRouter", "Replicate", "Anyscale", "DeepSeek"],
"models": [
{
"modelId": "llama-4-scout",
"modelName": "Llama 4 Scout",
"family": "Meta",
"paramsB": 109,
"license": "Llama 4 Community License",
"openWeights": true,
"offers": [
{ "provider": "DeepInfra", "inputPrice": 0.16, "outputPrice": 0.55, "blendedPrice": 0.355, "contextWindow": 10000000, "outputTPS": 170, "features": ["function-calling", "vision"] }
]
}
]
}Code samples
Python SDK
from tensorfeed import TensorFeed
tf = TensorFeed()
matrix = tf.inference_providers(family="Meta")
for m in matrix["models"]:
cheapest = min(m["offers"], key=lambda o: o["blendedPrice"])
print(f"{m['modelName']:<28} {cheapest['provider']:<14} ${cheapest['blendedPrice']:.3f}")TypeScript SDK
const res = await fetch("https://tensorfeed.ai/api/inference-providers?family=Meta");
const { models } = await res.json();
for (const m of models) {
const cheapest = m.offers.reduce((a, b) => a.blendedPrice < b.blendedPrice ? a : b);
console.log(`${m.modelName}: ${cheapest.provider} @ $${cheapest.blendedPrice}`);
}FAQ
Why is the same model priced differently across providers?
Each inference provider runs its own GPU fleet, quantization strategy, and batching policy. Together and Fireworks anchor on FP8 Turbo variants for speed. DeepInfra optimizes for raw cost. Groq runs custom LPU silicon at very high throughput with a context-window trade-off. The price spread on a single model is routinely 3-10x.
How fresh is this data?
Editorial weekly refresh. Provider pricing changes more often than embedding pricing but less often than spot-priced compute, so a weekly cadence is the right granularity.