The Cheapest AI Model on the Market Costs 1.7 Cents per Million Tokens

Ripper·May 4, 2026·5 min read

I pulled the live OpenRouter catalog this afternoon. 372 models routed across 50-plus providers, normalized to the same per-token pricing schema. The shape of the inference market in one snapshot. And the headline number is small: $0.017 per million input tokens for IBM's granite-4.0-h-micro, currently the cheapest paid model on OpenRouter.

That is one and seven-tenths of a US cent for one million tokens of inference. A million tokens is roughly 750,000 words. So you can run a token through a model and pay less than a millionth of a cent for the privilege. The inference floor is now too cheap to think about.

372 Models, 33 of Them Free

The most-cited number in any AI pricing piece is "flagship per-token cost." That number tells you what Anthropic and OpenAI charge for their best stuff and not much else. The catalog tells a different story. Below the flagships there is an enormous shelf of capable, cheap, and increasingly free models that handle 80% of real workloads.

Snapshot of the top namespaces today: OpenAI lists 64 models on OpenRouter, Qwen lists 51, Google lists 31, Mistral 25, Anthropic 14, Meta 14, DeepSeek 13, Z.ai 13. Open-weight providers (Qwen, Mistral, DeepSeek, Meta, Z.ai) are over 100 models combined. The proprietary frontier is a thin layer on top of a dense open-source middle.

And 33 of those models are priced at zero. Free tier. Real free, not promotional-credits free. Google's Gemma family, Meta's Llama Guard, Baidu's Qianfan OCR, several distillations of frontier models, Mistral's smaller variants. If you are willing to use a smaller or more permissively-licensed model, the marginal cost of inference is literally nothing.

What the Floor Tells You About the Ceiling

Anthropic charges $15 per million input tokens for Claude Opus 4.7. The cheapest paid model on OpenRouter is roughly 880 times cheaper. A 880x spread between cheapest and most expensive is not a market with a clean substitute curve. It is a market with two distinct products that share a name.

One product is "raw text-token transformation," which is becoming a commodity. Open-weight models running on cloud GPU at scale, dispatched through OpenRouter or directly via Together or Fireworks, can do this for under a cent per million tokens. The price floor is set by the cost of GPU minutes, not by the cost of training the model. That floor will keep falling as GPU rental rates fall and as smaller distillations get good enough.

The other product is "intelligence on the hard problem." SWE-bench scores of 65+, 200K+ context with no degradation, multi-turn tool use that does not lose the thread, careful refusal behavior. Anthropic, OpenAI, and Google sell that product. They will keep charging premium prices for it as long as the gap to the open shelf is large enough to justify the spread.

The interesting question for 2026 is how fast the gap closes. DeepSeek R1 shocked everyone by closing it on reasoning at a tenth of the price. Qwen has been closing it on Chinese-language benchmarks. The frontier labs respond by extending the spread on capabilities the open shelf cannot easily replicate (1M+ context, agentic reliability, multimodal). So far the spread has held. But it has narrowed.

Practical Numbers Worth Remembering

Some price points from today's pull, for the next time someone asks you what AI inference costs:

Cheapest paid input: IBM's granite-4.0-h-micro at $0.017 per million tokens. 131K context window.
Cheapest paid output: Meta's Llama Guard 3 8B at $0.03 per million tokens. Designed for content classification.
Cheap-but-capable workhorses: Mistral Nemo at $0.02 input / $0.03 output, Llama 3.1 8B at $0.02 / $0.05, Qwen Turbo at $0.03 / $0.13. All with 16K-131K context. Each costs less per million tokens than a stamp.
Free with reasonable quality: Google's Gemma 3 family in 4B, 12B, and 27B sizes. The 27B model has 131K context at zero cost.
Largest context window in the catalog: OpenRouter's auto-routed model at 2,000,000 tokens. Two million tokens is roughly a copy of every Tolkien novel, with room to spare.

What this list says, more loudly than any headline: if you are building agentic workflows that fan out reads across many small tasks (parsing, summarization, classification, intent detection, embedding generation), you should not be paying flagship rates for those steps. You should route them to the cheap shelf and reserve the flagship for the steps that actually need it.

What I Do With This Data

On TensorFeed I built two endpoints around the OpenRouter catalog because it is the most useful inference snapshot we publish. /api/openrouter/models gives you the raw JSON of all 372 models with normalized pricing. /api/today bundles a summary of the catalog into a single morning brief alongside news, papers, and HF trending.

Both are free, no auth. If you build a routing layer for your own agent and want a daily-refreshed source for "cheapest model for X," that data is sitting there. If you publish your own pricing comparison, our snapshot is the closest thing to a daily census. The data acquisition is the moat, and we are giving the moat away.

The thesis is simple. The AI ecosystem is too fragmented for any one team to track, and the prices change too often for any cached page to stay accurate. So we capture the snapshot every day, expose it on a stable URL, and let the agents and humans who need it pull what they need. That is what TensorFeed is, and the pricing data is one of the cleanest examples.

Tomorrow these numbers will be different. The cheapest model will be cheaper, the catalog will be larger, the namespaces will reshuffle. The thing that will stay the same is the direction. Down and out and free.

Live data: /api/openrouter/models. Companion piece: The AI API Pricing War of 2026.