Markets · AI Economics

The Tokenmaxxing Era Just Ended. The Run-Rate Doubling Curve Just Got an Efficiency Asterisk.

Marcus Chen·June 27, 2026·6 min read

CNBC ran a piece on Friday with a word in the headline that I have been waiting to see in print: tokenmaxxing. The thesis underneath it is that the era of incentivizing developers to use as many tokens as humanly possible, with no return-on-investment question asked, is closing. The receipts are real. Uber blew through its entire 2026 AI budget in four months and capped Claude Code at $1,500 per employee per month per tool. Lindy moved one hundred percent of its production traffic from Claude to DeepSeek. Vercel watched DeepSeek's share of AI Gateway token volume jump from under one percent to seventeen percent inside the month of May. Z.ai shipped GLM 5.2 on June 17 and landed within a percentage point of Opus 4.8 on a key agentic benchmark at roughly one fifth the cost.

The headline reads as a story about enterprise belt-tightening. The deeper read is about the slope of the curve that Anthropic and OpenAI are riding into the IPO window. Both companies are in the same room as a Wall Street disclosure attorney right now. Both are walking that attorney through a revenue table where the most recent line items roughly double every two to three months. The CNBC piece is the first wire-level signal that the assumption underneath that table is starting to fray.

The Math

Anthropic disclosed a $47 billion annualized run-rate at the end of May. The same company was at $30 billion in April, $14 billion at the Series G in February, $9 billion at the end of 2025, and roughly $1 billion at the start of 2025. That is the steepest enterprise software ramp on record. OpenAI was pacing closer to $25 billion earlier this year, up from $13.1 billion across all of 2025. Both numbers come from disclosure-quality sources. Both are baked into the S-1 paperwork.

Data point	Value	Notes
Anthropic run-rate (May)	$47B	Up from $30B in April, $14B at Series G in February
OpenAI run-rate (early 2026)	~$25B	Up from $13.1B for all of 2025
Uber per-dev AI cap	$1,500/mo	Per employee, per agentic coding tool, after burning 2026 budget in four months
Lindy traffic shift	100% to DeepSeek	Off Claude entirely, CEO says it saves millions
Vercel DeepSeek share (May)	<1% to 17%	Share of token volume; share of spend stayed near 1 percent
GLM 5.2 vs Opus 4.8	~1 pt gap	On a key agentic benchmark, at roughly 1/5 the cost, MIT license

Two of those rows tell the same story from different angles. Vercel's gateway data shows the volume side of the shift: buyers are routing more inference at DeepSeek even though the revenue line at DeepSeek barely moves, because the per-token cost is so much lower that seventeen percent of token volume maps to roughly one percent of spend. Lindy is the existence proof at the company level: a real production agent business ran the migration end to end and pocketed the difference. Uber is the existence proof at the enterprise procurement level: when the bill gets unmanageable, the CFO writes a per-seat ceiling and the ceiling sticks.

The Buyer Just Found Vocabulary

The reason this matters is that the buyer side of the AI market spent eighteen months without a clean conceptual frame for spending. The dominant motion was: a developer adopts a coding agent, the agent emits twenty thousand tokens of context per prompt, the bill shows up at the end of the month, the finance team writes a check because the developer says the tool is essential. That worked at small scale. It does not work when the tool ships to thousands of developers at a Fortune 500 enterprise and the line item crosses a number the CFO will actually defend in front of a board.

What changed in the last sixty days is that the buyer started getting vocabulary. The CNBC piece names the spending pattern, contrasts it with an efficiency posture, and quotes named CEOs and analysts pointing the same direction. Once a wire publishes a frame like that, the frame propagates to the procurement decks at every other large buyer inside two weeks. Tokenmaxxing becomes the thing the new procurement deck explicitly says it is not. The $1,500 cap at Uber stops being one company's policy and starts being the floor of an industry norm.

The procurement question that gets asked next is the dangerous one for a frontier API business: what is the marginal value of the most expensive token I am buying right now, and would a cheaper token serve the same workflow? For a long-horizon agentic coding session against Opus 4.8 the answer is usually yes, the premium is worth it. For a customer support agent, a retrieval-augmented chatbot, an internal documentation assistant, the answer is increasingly no. That second category is where the volume lives.

What This Does to the IPO Math

Anthropic filed a confidential S-1 earlier this quarter. OpenAI is on a 2027 IPO clock with internal paperwork already in motion, the same window we covered in the IPO filing piece. Both prospectuses share a structural feature: the revenue ramp on the cover is steep enough that it carries the multiple, but the disclosure language has to explain why the curve will keep going. The standard answer is some combination of enterprise penetration, new product lines (Claude Code, Codex, the partner channel), and federal procurement. The new answer has to also address the tokenmaxxing cliff.

Three specific things change inside the prospectus. First, the revenue concentration disclosure gets harder to write. If a single customer like Uber can shave a meaningful percentage off a vendor's monthly run-rate with one policy memo, the customer concentration risk section reads differently than the Salesforce-era version. The footnote about the top ten customers as a share of revenue is going to be a bigger footnote.

Second, the cost-of-revenue line gets a competing narrative. For the last year, the bull case on Anthropic and OpenAI gross margin was that frontier compute would get cheaper as TPU and Jalapeño and Maia silicon came online, while average revenue per token stayed high because the frontier model was worth the premium. The tokenmaxxing-to-efficiency shift attacks the revenue half of that equation, not the cost half. Cheaper inference helps the lab, but only if the lab is the one serving the inference. When the customer routes the call to DeepSeek instead, the cost savings accrue to the customer and the silicon investment is stranded.

Third, the run-rate disclosure language has to add a sentence about route-by-route revenue durability. Anthropic's $47B run-rate is not made of one cohort of customers; it is made of a Claude Code cohort, a Cowork cohort, an API cohort, an enterprise cohort, a sovereign-bundle cohort. Each cohort has a different sensitivity to the tokenmaxxing cliff. The Claude Code cohort is the least sensitive because the frontier premium genuinely earns its keep on long-horizon coding work. The API cohort serving general-purpose chat and retrieval is the most sensitive. The prospectus has to telegraph that mix honestly, because the analyst on the other side of the desk is going to ask.

The Open-Weight Floor Below

The thing that turns a cyclical buyer pullback into a structural repricing is the existence of a credible substitute. GLM 5.2 is that substitute. A 753 billion parameter open-weight model under the MIT license with a one million token context window, released by Z.ai on June 17, landing within a point of Opus 4.8 on a key agentic benchmark at roughly one fifth the cost. The last clause is what matters. A buyer can self-host GLM 5.2 on its own GPU pool or route to a third-party inference provider and get frontier-adjacent quality at a price point that does not require capping seats at $1,500.

Until this quarter, the open-weight track and the frontier track moved on parallel curves with a one or two release-cycle gap. The credible open-weight option was always six months behind the credible closed-weight option, so the procurement question was framed as a quality and recency decision, and quality usually won. With GLM 5.2 inside a percentage point of Opus 4.8 on agentic work, that gap closes to weeks. The procurement question reverts to a cost decision, and on a cost decision the open weight wins almost every time outside the highest-stakes workflow. We covered the route-it-cheap thesis in the pricing floor piece and the inference floor in the May floor update. The buyer side is finally catching up.

The wrinkle is that the open-weight pressure is asymmetric by country. The same week CNBC ran the tokenmaxxing story, the Trump administration was staggering GPT-5.6 by customer and had just pulled Fable 5 and Mythos 5 from Anthropic under export control. The federal gate hits closed US frontiers; it does not hit open Chinese weights. A US enterprise that wants GPT-5.6 access waits in a queue. A US enterprise that wants GLM 5.2 downloads the weights. That asymmetry is going to push a portion of demand off the closed US labs onto the open Chinese ones, on top of the cost pressure, for as long as the federal gate exists. Both sides of the squeeze hit the same revenue line.

What Anthropic and OpenAI Actually Do About This

The honest answer is that the playbook is already running, just not in public. The Karpathy hire at Anthropic, the $150M Partner Network at OpenAI, the Seoul chaebol bundle, the federal customer channel, the Codex and Claude Code surfaces: all of these are bets on workflow lock-in that survives a tokenmaxxing crackdown. The shared thesis is that the right response to commoditization at the API layer is to move the revenue up the stack to a place where the per-token premium is embedded in a workflow the buyer cannot unbundle. Claude Code is the cleanest version of that bet. The frontier model is inside the IDE; the IDE is inside the dev loop; the dev loop is inside the procurement contract; the procurement contract does not have a clean DeepSeek substitution path.

The other half of the response is pricing. Both labs have room to step the API price down, especially at the long tail of general-purpose chat and retrieval workloads where the frontier-vs-open gap is smallest. The trade is real: a thirty percent price cut on a commodity tier protects volume but compresses gross margin into a quarter where the prospectus would prefer the opposite. We are going to see selective price drops, tiered models, batch-discount API SKUs, and probably a renewed push on dedicated capacity contracts that lock buyers to the lab through a multi-quarter commitment. None of this fixes the cliff. It just slows it.

Our Take

Three signposts in the next ninety days that decide whether this is a real curve break or a temporary pause. First, whether Anthropic reports a Q3 run-rate that holds the doubling slope. If May was $47B and August is anywhere short of $70B, the tokenmaxxing cliff is showing up in the books. Second, whether a second Fortune 500 buyer publishes a Uber-shaped cap policy inside the next four weeks. Once two large enterprises do it publicly, the policy becomes a default and the floor moves. Third, whether OpenAI or Anthropic ships an explicit commodity-tier SKU below current API pricing, dressed up as a new product but functionally a defensive price cut. That move would be the clearest admission that the buyer-side discipline has reached the model card.

The cleaner read on this week: the doubling curve is not dead, but the curve now has a competing curve underneath it that the IPO models did not assume. We are tracking the cost side on the pricing floor piece and the buyer behavior on the capex scoreboard. Next data point to watch: the next public S-1 amendment, or the next earnings-call comment from a hyperscaler about AI-related backlog conversion. Both will be written under the assumption that the buyer just learned a new word, and the word changes how the line goes.

Back to Originals Back to Feed