GPT-5.5 Just Landed. OpenAI Doubled the Price and Raised the Bar.
OpenAI shipped GPT-5.5 yesterday. It is the first fully retrained base model since GPT-4.5, and the benchmarks are genuinely impressive. It also costs twice as much as GPT-5.4. That is not a typo. OpenAI is betting that raw capability justifies a price hike in a market where everyone else has been racing to the bottom.
I've spent the last 24 hours running it through our tracking pipeline and comparing the numbers. Here is what we know so far.
What GPT-5.5 Actually Is
GPT-5.5 is not an incremental update. OpenAI describes it as a "complete retrain" on a new data mix and architecture revision, the first since GPT-4.5 landed in early 2025. The 5.1 through 5.4 releases were all fine-tuned variants of the 5.0 base. This one is a fresh foundation.
The headline specs: 1 million token native context window. Natively omnimodal, meaning it handles text, images, audio, and video in a single forward pass rather than routing through separate encoders. Available immediately to Plus, Pro, Business, and Enterprise subscribers, and in the API for all developers.
One detail that stands out: OpenAI claims GPT-5.5 uses 40% fewer tokens than GPT-5.4 to complete equivalent tasks. If that holds up in production, it partially offsets the price increase. A model that costs 2x per token but uses 40% fewer tokens is really costing you about 1.2x for the same workload. Not cheap, but not the sticker shock it first appears.
Latency is also worth noting. Despite being a larger, more capable model, OpenAI says GPT-5.5 matches GPT-5.4's per-token latency. That suggests significant inference optimization work under the hood, likely involving new speculative decoding techniques and hardware-specific tuning for their latest GPU clusters.
The Pricing: $5 In, $30 Out
Let's talk about the elephant in the room. GPT-5.5 is priced at $5 per million input tokens and $30 per million output tokens. GPT-5.4 was $2.50/$15. That is a clean 2x increase across the board.
There's also a GPT-5.5 Pro tier at $30 input and $180 output, presumably with higher rate limits and priority access. That puts it in Anthropic Opus territory for pricing.
| Model | Input (per 1M) | Output (per 1M) | Context |
|---|---|---|---|
| GPT-5.5 | $5.00 | $30.00 | 1M |
| GPT-5.5 Pro | $30.00 | $180.00 | 1M |
| GPT-5.4 (previous) | $2.50 | $15.00 | 128K |
| Claude Opus 4.7 | $15.00 | $75.00 | 200K |
| Gemini 3.1 Pro | $1.25 | $5.00 | 2M |
You can model the real cost impact for your workloads on our cost calculator. The token efficiency gains matter a lot here, so run the numbers before you react to the sticker price alone.
The Benchmarks Tell the Story
This is where GPT-5.5 earns its price tag. The benchmark scores are not incremental improvements. They are category-leading across the board.
| Benchmark | GPT-5.5 | Notes |
|---|---|---|
| Terminal-Bench 2.0 | 82.7% | New high across all models |
| SWE-Bench Pro | 58.6% | Software engineering tasks |
| Expert-SWE | 73.1% | Advanced engineering problems |
| FrontierMath Tier 4 | 35.4% | Double Opus 4.7's score |
| Artificial Analysis Index | 60 | Top of the leaderboard |
The FrontierMath Tier 4 result is the one that jumps out. 35.4% is double what Claude Opus 4.7 achieves on the same benchmark. FrontierMath Tier 4 is designed to test graduate-level mathematical reasoning, the kind of problems where most models score in the low teens. Doubling the best competitor is not a marginal win.
The Artificial Analysis Intelligence Index score of 60 puts GPT-5.5 at the top of their overall leaderboard, which aggregates performance across reasoning, coding, math, and knowledge tasks. You can see how this compares to other models on our benchmarks page.
Terminal-Bench 2.0 at 82.7% is also notable. This benchmark tests real-world terminal and CLI operations, and GPT-5.5 is the first model to break 80%. For developers building agentic coding tools, that number matters.
Where This Leaves the Competition
GPT-5.5 creates an interesting split in the market. OpenAI is now running a two-tier strategy: GPT-5.4 and its Mini variant for cost-sensitive production workloads, and GPT-5.5 as the premium flagship for tasks where capability matters more than cost.
Anthropic's Claude Opus 4.7 remains the most expensive option at $15/$75, but it now faces a competitor that outperforms it on multiple benchmarks at a third of the price. The Opus line has traditionally justified its premium through superior reasoning on complex, multi-step tasks. That story gets harder to tell when GPT-5.5 doubles your FrontierMath score for $5 input instead of $15.
Google's Gemini 3.1 Pro still owns the value end of the market at $1.25 input with a 2 million token context window. GPT-5.5's 1M context is impressive but still half of what Gemini offers. For pure context length per dollar, Google remains untouchable.
The real question is whether GPT-5.5's token efficiency claim holds up. If it genuinely uses 40% fewer tokens on real workloads, the effective cost gap with GPT-5.4 narrows considerably, and it becomes competitive with Claude Sonnet 4.6 on a per-task basis despite the higher per-token rate.
Our Take
OpenAI is making a bet that the market will pay more for genuinely better models. For the past year, the entire industry has been in a race to cut prices. GPT-5.5 is the first major release to reverse that trend, and the benchmark results suggest it might be justified.
For most production applications, GPT-5.4 and GPT-5.4 Mini are still the right choice on cost alone. But for agentic workflows, complex reasoning tasks, and applications where accuracy on hard problems directly impacts value, GPT-5.5 looks like it earns the premium.
We are adding GPT-5.5 to our models tracker and cost calculator today. We will be watching the independent benchmark reproductions closely over the next week. OpenAI's self-reported numbers are strong, but third-party validation is what counts.
One thing is clear: the pricing floor discussion just got more complicated. Cheap models are getting cheaper. But the ceiling is moving up too, and OpenAI is betting that developers will pay for it. Based on what we have seen so far, they might be right.