LIVE
OPUS 4.7$15 / $75per Mtok
SONNET 4.6$3 / $15per Mtok
GPT-5.5$10 / $30per Mtok
GEMINI 3.1$3.50 / $10.50per Mtok
SWE-BENCHleader Claude Opus 4.772.1%
MMLU-PROleader Opus 4.788.4
VALS FINANCEleader Opus 4.764.4%
AFTAv1.0 whitepaper live at /whitepaper
OPUS 4.7$15 / $75per Mtok
SONNET 4.6$3 / $15per Mtok
GPT-5.5$10 / $30per Mtok
GEMINI 3.1$3.50 / $10.50per Mtok
SWE-BENCHleader Claude Opus 4.772.1%
MMLU-PROleader Opus 4.788.4
VALS FINANCEleader Opus 4.764.4%
AFTAv1.0 whitepaper live at /whitepaper
All systems operational0 AI providers monitored, polled every 2 minutes
Live status
Back to Originals
Analysis · Agents

Anthropic Just Taught Claude to Dream Between Tasks. Long-Running Agents Got Their Memory Layer.

Ripper··6 min read

At Code with Claude in San Francisco yesterday, Anthropic launched a feature it is calling "dreaming" as a research preview for Claude Managed Agents. Between tasks, an agent can now go back over its own session transcripts, decide which memories are worth keeping, rewrite them, and surface new playbooks for next time. In the demo, the agent generated a file called descent-playbook.md from analyzing its own past work, then used it on the next run.

Outcomes, multi-agent orchestration, and webhooks all moved from research preview to public beta the same day. Rate limits doubled for Pro, Max, and Enterprise users. Taken together, this is the most coherent agent platform release Anthropic has shipped, and the first one where the pieces feel like they were designed to compose rather than ship in parallel.

What Dreaming Actually Does

Memory in current Claude agents is per-session. The agent reads its prior context, decides what to remember inside the active window, and the rest evaporates when the session ends. That is fine for short tasks. It falls apart the moment you want a multi-week deployment where the agent learns from yesterday and applies it today.

Dreaming is the layer that operates between sessions. The agent reads its own transcripts offline, identifies patterns (what worked, what burned tokens, what the user corrected), and reorganizes the persistent memory store. Anthropic frames it as memory consolidation: short-term experience compresses into longer-term, durable artifacts the agent will rely on next time. The system writes new memory entries, prunes dead ones, and produces named playbooks the way a human contractor writes a SOP after a job.

This is the missing layer. Anyone who has run a long-horizon agent already knows the failure mode: by week two, the agent is repeating the same mistakes the user corrected in week one, because the correction was buried in a transcript no one will ever read again. Dreaming gives the agent a structured way to harvest those corrections without the operator hand-feeding them back as system prompt edits.

The catch: dreaming is gated. It is in a research preview tier you have to apply for, and Anthropic has not published the inference cost of a dream cycle. Reasonable to assume the economics are non-trivial, since reflection over an entire session log is a long-context read followed by structured writes. Whoever gets pricing transparent first sets the bar.

Outcomes Is Public Beta. The Self-Improving Loop Just Became a Product.

Outcomes is the second-most-interesting piece. The pattern is simple: the developer defines a success rubric, Claude iterates on the task autonomously, and a separate grader agent scores each attempt against the rubric until the work meets the bar (or the loop times out). Anthropic says internal testing showed up to a 10-point lift in task success rate versus a standard prompting loop.

Two things matter here. First, this productizes the "Ralph loop" pattern that power users have been hand-rolling for a year: write a rubric, run the agent, judge the output, retry. Second, it makes the grader a separate agent, which is the right architectural choice. A single model judging its own work has a known confirmation problem. A dedicated grader with a different prompt and a different objective produces numbers you can actually trust.

For TensorFeed's own paid endpoints, this is exactly the loop we would want to wire into automated freshness audits, license-redistribution checks, and the kind of self-policing we wrote about in our own audit-killed-two-endpoints piece. Outcomes turns "run the audit and judge it" from a manual chore into an agent contract.

Multiagent Orchestration: Fleets, Not Chains

Multiagent orchestration also moved to public beta. Anthropic showed a moon-drone-landing demo with three coordinated agents: Commander, Detector, and Navigator. The framing is fleets of specialized agents under a coordinator, not the brittle chain-of-agents pattern that LangChain made fashionable in 2023 and that everyone quietly stopped recommending.

The honest read: orchestration patterns are not new. AutoGen, CrewAI, and LangGraph have all been here. What is new is that the orchestration is now first-class inside the provider, not a third-party framework wrapping the API. That collapses an entire layer of glue code, and it means a single observability surface (logs, traces, costs) covers the whole fleet instead of fragmenting across libraries.

The cost question is the open one. Five agents working in parallel is five times the input cost. If Outcomes is also looping, it can multiply further. Watch the per-call token accounting on these workloads before assuming they are economic at scale.

Webhooks Sound Boring. They Are Not.

The third public-beta piece is webhooks for job completion. An agent finishes work, your server gets a POST. That sounds dull. It is the difference between an agent platform that assumes you are a human watching a screen and an agent platform built for backend integration.

Once you have webhooks plus dreaming plus Outcomes, the architecture obvious-mode is: kick off a long task, the agent works, sleeps, dreams, retries against a rubric, posts a completion webhook to your service, your service pulls the artifact. That is a real asynchronous system, and it is the shape every production agent deployment actually wants. Polling is not.

Where This Puts Anthropic vs. OpenAI vs. Google

OpenAI shipped Operator and the Responses API stack last year. Google shipped Gemini Enterprise agents in March. Both have orchestration, both have memory, both have long-running task support of some kind. So why does this one read different.

Two reasons. First, the bundle. Dreaming, Outcomes, multiagent, and webhooks were all announced in the same keynote with a single coherent story (offline reflection plus rubric-driven iteration plus fleets plus async hooks). OpenAI's agent surface is stitched across Operator, Assistants, the Responses API, and Codex with overlapping but inconsistent semantics. Anthropic put one platform on stage and named the parts.

Second, the offline-reflection angle is genuinely novel as a productized feature. Memory consolidation has been a research topic for two years (Anthropic's own constitutional AI work hinted at it, and the open-source Letta project shipped something adjacent). No frontier lab had bundled it as a tier you can flip on inside a managed service. That is the headline.

The structural risk for Anthropic: dreaming is the kind of feature that gets cloned in 90 days. The mechanism is not a moat, the integration into the agent runtime is. Whoever ties memory consolidation to the rest of their agent stack tightest wins, not whoever named it first.

Doubled Rate Limits Is The Quiet Power Move

Easy to miss in the noise: Anthropic doubled the five-hour rate limits for Pro, Max, and Enterprise on Claude Code. That is the second time in three months they have raised the ceiling for paying power users. Read it as a signal about both compute headroom (Anthropic has it now, after the 10GW Amazon and Google contracts) and about pricing strategy. They are not raising the price. They are widening the throughput at the existing price.

For developers running automated workflows against Claude (us included, on the agent payments loop and the daily HF dataset commit), the practical effect is fewer 429s on the spikes. Boring. Important.

Our Take

Dreaming is the right name for the right feature. Long-running agents have been structurally bottlenecked by per-session memory for as long as we have had them, and the fix was always going to look like offline reflection rather than longer context windows. Anthropic shipped the obvious answer before anyone else and bundled it with the rest of the stack instead of treating it as a research demo.

The piece worth watching: how dreaming priced once it leaves research preview. If reflection cost is folded into the existing managed agent rate, this becomes a default-on capability and the bar for every other provider rises. If it is metered separately at a premium, it stays a high-end feature and OpenAI gets a clean opening to undercut.

Either way, Outcomes plus webhooks plus orchestration moving to public beta in one drop is the more durable change. That is Anthropic saying out loud that Managed Agents is a product line, not a research vehicle. The frontier lab continues its turn into a vendor.