Inference Provider Pricing
Same open-weight model, different price across Together, Fireworks, Groq, DeepInfra, OpenRouter, Replicate, Anyscale, and first-party APIs. The price spread on a single model can be 3-10x for the same nominal weights.
Each inference provider runs its own GPU fleet, quantization strategy, and batching policy. Together and Fireworks anchor on FP8 Turbo variants for speed. DeepInfra optimizes for raw cost. Groq runs custom LPU silicon for very high throughput at a context-window cost. OpenRouter routes across the others. The matrix below sorts every offer cheapest first per model, with the lowest-blended-price row marked.
For agents: full matrix at /api/inference-providers. Cheapest path for one model at /api/inference-providers/cheapest?model=<id>. Free, no auth, cached 10 min.