Question 1

Why does the same model cost different amounts at different providers?

Accepted Answer

Each inference provider runs its own GPU fleet, quantization strategy, and batching policy. Together and Fireworks tend to anchor on FP8 Turbo variants for speed; DeepInfra optimizes for raw cost; Groq runs custom LPU silicon for very high throughput. The price spread on a single model can be 3-10x for the same nominal weights.

Question 2

Which inference provider is cheapest for Llama 4 Scout?

Accepted Answer

As of the current snapshot, DeepInfra at a blended $0.355 per 1M tokens. Together AI and Groq tie at $0.385. The cheapest path can be queried programmatically at /api/inference-providers/cheapest?model=llama-4-scout.

Question 3

Which inference provider is fastest?

Accepted Answer

Groq, by a wide margin. Their custom LPU silicon serves Llama 4 Scout at ~950 output tokens per second versus Together at ~195 TPS and DeepInfra at ~170 TPS. The trade-off is a 128k context cap on Groq versus the 1M+ ceiling at Together and Fireworks.

Question 4

Should I use OpenRouter or pick a specific provider?

Accepted Answer

OpenRouter is an aggregator that routes your call to whichever underlying provider is cheapest or available, so for a single workload it usually matches the cheapest direct provider price. Pick a specific provider when you need a guaranteed feature flag (function calling on Together vs json mode on Fireworks varies) or specific latency profile (Groq for TPS, DeepInfra for cost).

Inference Provider Pricing