Back to Originals

Claude Opus 4.7 Just Dropped. Here's What Changed.

Ripper··6 min read

Anthropic shipped Claude Opus 4.7 this morning. If you blinked, you might have missed it. No keynote, no teaser video, no countdown timer. Just a blog post, an updated model card, and a new API identifier. That is how Anthropic does releases now. The story is in the spec sheet.

So let me walk you through what actually changed, because on paper this looks like a point release and in practice it is closer to a new tier.

The Big One: 1M Token Context

Opus 4.6 maxed out at 200,000 tokens. Opus 4.7 goes to 1,000,000. That is a 5x jump in the middle of what most people thought was a mature product line. And the price held steady at $15 per million input tokens and $75 per million output.

One million tokens is enough context to fit roughly 750,000 words. You can drop an entire medium-sized codebase into a single call. You can paste in multiple books. You can feed it an hour of transcribed audio without hitting the wall. For most developers I talk to, the 200K limit was the single biggest constraint on what they could ask Claude to do. That constraint is gone.

Gemini 2.5 Pro has had 1M context for a while. Llama 4 Maverick does too. What is new is that Claude, the model that leads on reasoning and code, now has it as well. That was the last remaining reason to pick Gemini over Claude for long-context work. Google still wins on price, but not on capability.

Benchmark Gains Are Modest But Consistent

Here is the honest truth: the benchmark gains from 4.6 to 4.7 are small. We are talking about gains of roughly 1 to 3 points across most tests. MMLU-Pro climbs from 92.4 to 93.8. HumanEval moves from 95.1 to 96.2. GPQA Diamond lifts from 74.2 to 76.5. MATH bumps from 91.8 to 93.1. SWE-bench, the one that measures real engineering tasks, jumps from 62.3 to 65.4.

SWE-bench is the number worth watching. A 3-point gain on real-world GitHub issue resolution is not nothing. That is the gap between a model that can close half the tickets you hand it versus one that closes two-thirds. At the scale of a developer team, that translates into meaningful time saved.

The rest of the gains look cosmetic on a chart and material in practice, because benchmark scores compound. A model that is 2% better on reasoning, 1% better on math, and 3% better on code is noticeably better at actual agentic work where all three matter at once.

What This Means for Pricing Pressure

Anthropic held the line on pricing. $15 input, $75 output. Same as 4.6. Same as 4.5. Same as Claude 3 Opus a full year ago. While Google, Mistral, and the open source world have been dropping prices, Anthropic sits at the top of the market and does not flinch.

The reasoning is pretty clear when you stack it up. Anthropic does not compete on price. They compete on capability. When they win a benchmark, they hold a premium. When a customer picks Claude, they pick it because they tried everything else and the output quality justified the cost. That positioning works as long as Claude stays at the top of the leaderboards. 4.7 keeps them there.

What you should read into this: Anthropic is confident that the market will keep paying for the best model. They are not worried about the budget end. They are running a luxury brand in a race where most of the competition is fighting for the Costco shelf.

The Agent Workflow Angle

The 1M context change is not just about long documents. It changes what agents can do.

Long-running agent sessions hit context limits fast. Every tool call, every retrieval, every piece of reasoning the agent does adds to the running context. At 200K, serious agent tasks needed active context management. Summarization passes. External memory. Custom chunking strategies. At 1M, most of that complexity evaporates. The agent can just keep going.

For Claude Code specifically, this is a quiet but massive upgrade. Long refactor sessions, multi-file debugging, extended test runs. All the workflows that previously required stopping and summarizing now run straight through. Anthropic did not make a big deal of this in the release notes, but engineers using Claude Code will feel it within a day.

When to Migrate

For new projects, use 4.7 from day one. For existing projects running on 4.6, the migration is almost certainly worth doing. Same API shape. Same price. Better outputs. The only reason not to migrate is if you have prompts that were specifically tuned to 4.6 behavior and you have not budgeted time for regression testing.

If you are running evals, lock 4.6 as a baseline and run 4.7 in parallel for a week. In almost every case I have seen so far, 4.7 wins on accuracy and loses nothing on latency. The context upgrade is the bonus.

The Competitive Picture

This release squeezes the field further. OpenAI's GPT-4o still wins on price and audio. Gemini 2.5 Pro still wins on cost per token at long context. Llama 4 remains the open source choice. But if you care about the best possible output and you are not price-sensitive, Claude Opus 4.7 is the answer.

The Anthropic strategy is now very legible. Release incrementally. Keep pricing flat. Let the benchmarks do the talking. Wait for competitors to blink. It is working.

The next question is what OpenAI ships in response. GPT-5 has been rumored for over a year. If Anthropic is going to hold the top spot, OpenAI needs something meaningful. We will cover it the moment it drops.

Track every frontier model release on TensorFeed.

See the full Claude Opus 4.7 model page, 4.7 vs 4.6 comparison, benchmark leaderboard, and pricing guide.

About Ripper: Ripper covers AI model releases, agent infrastructure, and the business of frontier AI at TensorFeed.ai. TensorFeed aggregates news from 15+ sources and is built for both humans and agents.