Last Updated: March 2026

AI API Pricing Guide: Every Provider Compared

AI API pricing in 2026 ranges from free open-source models to $75 per million tokens for premium models like Claude Opus. Most developers spend between $0.10 and $15 per million input tokens depending on the model tier and use case.

AI API pricing can be confusing. Every provider uses slightly different units, some charge differently for input and output tokens, and prices change frequently. This guide breaks it all down in one place, with real cost examples so you can estimate what your project will actually cost. All prices are in USD per 1 million tokens unless noted otherwise.

Pricing Overview: All Models

Here is every major API model with its current pricing, sorted by provider. Prices are per 1 million tokens. For context, 1 million tokens is roughly 750,000 words, or about 4-5 full-length novels.

ProviderModelInput $/1MOutput $/1MContext
AnthropicClaude Opus 4.6$15.00$75.00200K
AnthropicClaude Sonnet 4.6$3.00$15.00200K
AnthropicClaude Haiku 4.5$0.80$4.00200K
OpenAIGPT-4o$2.50$10.00128K
OpenAIGPT-4o-mini$0.15$0.60128K
OpenAIo1$15.00$60.00200K
OpenAIo3-mini$1.10$4.40200K
GoogleGemini 2.5 Pro$1.25$10.001M
GoogleGemini 2.0 Flash$0.10$0.401M
MetaLlama 4 ScoutFree*Free*10M
MetaLlama 4 MaverickFree*Free*1M
MistralMistral Large$2.00$6.00128K
MistralMistral Small$0.10$0.30128K
CohereCommand R+$2.50$10.00128K
CohereCommand R$0.15$0.60128K

* Open source models are free to self-host. Hosted API pricing varies by provider (e.g., Together, Fireworks, Groq). Prices are subject to change. Check provider websites for the most current pricing.

Pricing by Provider

Anthropic

ModelInputOutputContextCapabilities
Claude Opus 4.6$15.00$75.00200Ktext, vision, tool-use, code
Claude Sonnet 4.6$3.00$15.00200Ktext, vision, tool-use, code
Claude Haiku 4.5$0.80$4.00200Ktext, vision, tool-use, code

OpenAI

ModelInputOutputContextCapabilities
GPT-4o$2.50$10.00128Ktext, vision, tool-use, code
GPT-4o-mini$0.15$0.60128Ktext, vision, tool-use, code
o1$15.00$60.00200Ktext, reasoning, code
o3-mini$1.10$4.40200Ktext, reasoning, code

Google

ModelInputOutputContextCapabilities
Gemini 2.5 Pro$1.25$10.001Mtext, vision, tool-use, code, reasoning
Gemini 2.0 Flash$0.10$0.401Mtext, vision, tool-use, code

Meta

ModelInputOutputContextCapabilities
Llama 4 ScoutFree*Free*10Mtext, vision, code
Llama 4 MaverickFree*Free*1Mtext, vision, code

Mistral

ModelInputOutputContextCapabilities
Mistral Large$2.00$6.00128Ktext, vision, tool-use, code
Mistral Small$0.10$0.30128Ktext, tool-use, code

Cohere

ModelInputOutputContextCapabilities
Command R+$2.50$10.00128Ktext, tool-use, RAG
Command R$0.15$0.60128Ktext, tool-use, RAG

Cost Calculator Examples

Abstract token prices are hard to reason about. Here are concrete examples showing what common tasks actually cost with different models. These assume typical token counts for each task type.

Example 1: Chatbot Application (10,000 conversations/month)

Assuming each conversation averages 2,000 input tokens and 1,000 output tokens:

ModelInput CostOutput CostTotal/month
Claude Opus 4.6$300.00$750.00$1,050.00
Claude Sonnet 4.6$60.00$150.00$210.00
GPT-4o$50.00$100.00$150.00
GPT-4o-mini$3.00$6.00$9.00
Claude Haiku 4.5$16.00$40.00$56.00
Gemini 2.0 Flash$2.00$4.00$6.00

The takeaway: there is a 175x cost difference between the most expensive and cheapest options for the same workload. Choosing the right model matters enormously.

Example 2: Document Summarization (1,000 documents/month)

Assuming each document is 10,000 input tokens and the summary is 500 output tokens:

ModelTotal/month
Claude Opus 4.6$187.50
Gemini 2.5 Pro$17.50
Mistral Small$1.15
Gemini 2.0 Flash$1.20

Example 3: Code Generation (500 requests/day)

Assuming 1,500 input tokens (prompt + context) and 2,000 output tokens (generated code) per request:

ModelTotal/month
o1 (reasoning)$2,137.50
Claude Sonnet 4.6$517.50
GPT-4o$356.25
o3-mini$156.75
GPT-4o-mini$21.38

Free Tier Comparison

Most providers offer free API access with usage limits. Here is what you get without spending anything:

ProviderFree Tier DetailsModels AvailableLimits
OpenAIFree credits for new accountsGPT-4o-mini, GPT-3.5Rate limited; credit expires
AnthropicFree credits for new accountsClaude Haiku, SonnetRate limited; credit expires
GoogleGenerous free tier via AI StudioGemini 2.0 Flash, 2.5 Pro (limited)15 RPM for Flash; lower for Pro
MistralFree tier availableMistral Small, open modelsRate limited
Meta (via hosts)Free self-hosting; hosted free tiers varyLlama 4 Scout, MaverickUnlimited if self-hosted

Pro tip: Google AI Studio offers the most generous free API access. If you are prototyping or building a low-traffic application, you can potentially run entirely on Google's free tier with Gemini 2.0 Flash.

Price Per Task Estimates

Here is roughly what common tasks cost per individual request using different model tiers. These are estimates based on typical token counts.

TaskTokens (in/out)Frontier ModelMid-tier ModelBudget Model
Summarize an article3K / 300$0.067$0.014$0.0006
Translate 1 page500 / 600$0.052$0.011$0.0004
Generate a function1K / 500$0.053$0.011$0.0005
Write a blog post500 / 3K$0.233$0.047$0.0019
Analyze a spreadsheet10K / 1K$0.225$0.045$0.0016
Chat response (avg)2K / 500$0.068$0.014$0.0005

Frontier model = Claude Opus 4.6 / o1. Mid-tier = Claude Sonnet 4.6 / GPT-4o. Budget = GPT-4o-mini / Gemini Flash.

Tips for Reducing API Costs

API costs can add up quickly, especially at scale. Here are practical strategies for keeping them under control:

1. Use the smallest model that works

This is the single most impactful optimization. For many tasks, GPT-4o-mini or Gemini Flash produces results that are nearly as good as frontier models at a fraction of the cost. Test your use case with cheaper models first and only upgrade if quality is genuinely insufficient. A model that is 10x cheaper and 95% as good is almost always the right choice.

2. Implement caching

If users ask similar questions, cache the responses. Both Anthropic and OpenAI offer prompt caching features that can reduce costs by up to 90% for repeated prefixes. Even simple application-level caching (storing responses for identical inputs) can save significant money.

3. Optimize your prompts

Shorter prompts cost less. Remove unnecessary instructions, examples, and context. Use system prompts efficiently. If you are including few-shot examples, test whether you really need all of them. Often 1-2 examples work nearly as well as 5-6.

4. Set max token limits

Always set a max_tokens parameter to prevent unexpectedly long (and expensive) responses. For a summarization task, you probably do not need more than 500 output tokens. For code generation, 2,000 is usually plenty.

5. Use model routing

Route different requests to different models based on complexity. Simple questions go to a cheap model; complex ones go to a frontier model. You can implement this with a classifier (which itself can be a cheap model) or with simple heuristics based on input length or keywords.

6. Batch your requests

Both OpenAI and Anthropic offer batch APIs with 50% discounts. If your use case does not require real-time responses (e.g., processing a backlog of documents), batching can cut your costs in half.

7. Consider open source models

For high-volume applications, self-hosting an open source model like Llama 4 or Mistral can be dramatically cheaper than API calls. The upfront infrastructure cost is higher, but per-request costs approach zero. See our open source LLM guide for details.

Understanding Tokens

Tokens are the fundamental unit of AI API pricing. A token is roughly three-quarters of a word in English. Here are some helpful benchmarks:

  • 1 token = roughly 4 characters or 0.75 words in English
  • 100 tokens = roughly 75 words (a short paragraph)
  • 1,000 tokens = roughly 750 words (about 1.5 pages)
  • 10,000 tokens = roughly 7,500 words (a long article)
  • 100,000 tokens = roughly 75,000 words (a short novel)
  • 1,000,000 tokens = roughly 750,000 words (several novels)

Important: input tokens and output tokens are priced differently, with output tokens typically costing 2-5x more than input tokens. This is because generating text is more computationally intensive than processing it. When estimating costs, always account for both sides.

Frequently Asked Questions

How much does the OpenAI API cost?

OpenAI API pricing varies by model. GPT-4o costs $2.50 per 1M input tokens and $10 per 1M output tokens. GPT-4o-mini is much cheaper at $0.15/$0.60. The o1 reasoning model costs $15/$60.

What is the cheapest AI API?

Google's Gemini 2.0 Flash is one of the cheapest at $0.10 per 1M input tokens. Open-source models like Llama 4 are free to self-host. Groq offers fast inference at competitive prices.

How are AI API tokens counted?

Roughly, 1 token equals about 4 characters or 0.75 words in English. A 1,000-word document is approximately 1,333 tokens. Most APIs charge separately for input (prompt) and output (completion) tokens.

Which AI API is best for production?

For reliability and quality, Anthropic (Claude) and OpenAI (GPT-4o) are the most popular choices. For cost-sensitive applications, Gemini Flash or self-hosted open-source models offer the best value.

Related Resources