Inference Cheapest

Free

GET /api/inference-providers/cheapest

Agent-friendly entry point into the inference provider matrix. Pass a canonical model id and get back the top 3 cheapest offers (default sort is blended price). Skips the full matrix payload, which is useful when an agent is just picking the cheapest path and does not need every column.

When to use this endpoint

When your agent needs the cheapest inference path for a specific open-weight model in a single call. For a different sort, pass ?sort=input|output|tps_desc.

Parameters

Name	In	Type	Description
model*	query	string	Canonical model id (llama-4-maverick, llama-4-scout, llama-3.1-70b, llama-3.1-405b, deepseek-v4-pro, deepseek-v4-flash, mixtral-8x22b, qwen-2.5-72b)`e.g. llama-4-scout`
sort	query	string	Sort order: blended (default), input, output, tps_desc`e.g. tps_desc`

* required

Example response

{
  "ok": true,
  "modelId": "llama-4-scout",
  "modelName": "Llama 4 Scout",
  "family": "Meta",
  "sortBy": "blended",
  "cheapest": { "provider": "DeepInfra", "blendedPrice": 0.355, "inputPrice": 0.16, "outputPrice": 0.55, "contextWindow": 10000000, "outputTPS": 170 },
  "top3": [
    { "provider": "DeepInfra", "blendedPrice": 0.355 },
    { "provider": "OpenRouter", "blendedPrice": 0.385 },
    { "provider": "Groq", "blendedPrice": 0.385 }
  ]
}

Code samples

Python SDK

from tensorfeed import TensorFeed
tf = TensorFeed()
result = tf.inference_cheapest("llama-4-scout")
print(f"Cheapest: {result['cheapest']['provider']} at ${result['cheapest']['blendedPrice']:.3f}/1M blended")

TypeScript SDK

const res = await fetch("https://tensorfeed.ai/api/inference-providers/cheapest?model=llama-4-scout");
const result = await res.json();
console.log(`Cheapest: ${result.cheapest.provider} @ $${result.cheapest.blendedPrice}`);

FAQ

What if my model is not in the matrix?

Returns 404 model_not_found. List available models at /api/inference-providers. We track the most-served open-weight models; if you need one we do not cover, the catalog is editorial and we add new models on demand.

Why is the sort default blended and not input?

Because real workloads have non-zero output usage. Blended at 1:1 input:output ratio is a better proxy for actual cost than input-only. If your workload is heavy-input or heavy-output, pass ?sort=input or ?sort=output.

Inference Cheapest

When to use this endpoint

Parameters

Example response

Code samples

Python SDK

TypeScript SDK

FAQ

What if my model is not in the matrix?

Why is the sort default blended and not input?

Related endpoints

Inference Provider Matrix

Embedding Models

Models

Routing Recommendations