The White House Told Anthropic to Make Fable 5 Jailbreak-Proof. Security Researchers Say That Is Not a Thing That Exists.

Kira Nolan·June 17, 2026·6 min read

AI SAFETY

The White House has told Anthropic what it will take to put Fable 5 back online for the people it just locked out. The model has to be jailbreak-proof. I have spent years reading adversarial machine learning papers, and I want to be precise about what that request is. It is a demand for a property that no deployed language model has ever had, that no published defense achieves, and that the researchers who study this for a living do not believe is currently possible.

What Washington Actually Asked For

On June 12 the administration ordered Anthropic to cut Fable 5 and Mythos 5 access for all foreign nationals, a directive The Verge described this week as export rules nobody understands. Because a global API cannot check passports at the token level, the only compliant setting was off, and both models went dark worldwide. I walked through that mechanism when it happened, in the original export control suspension.

This week the reinstatement terms came into view, and they are stricter than the original order. WIRED reported that the White House wants Anthropic to block all jailbreaks before Fable 5 returns, and that security experts told them this may not be possible. Anthropic reportedly sent Nicholas Carlini, one of the most cited adversarial ML researchers working today, to explain the technical reality to the government. When the company whose model is on the line sends its sharpest red-teamer to manage expectations, that tells you which direction the expectations need managing.

Jailbreak-Proof Describes Something That Does Not Exist

Here is the part the policy skips. There is no frontier model, from any lab, that is robust to jailbreaks. Not Claude, not GPT, not Gemini, not an open-weight model. Adversarial robustness has been an open research problem for more than a decade, and the consistent result across that decade is that published defenses get broken, often within months, often by the people who proposed them. Carlini built a career on exactly that pattern: a defense ships claiming to stop attacks, and a follow-up paper shows it does not.

Fable 5 is not a counterexample. It is the example. The model got pulled in the first place because Amazon researchers jailbroke it days after launch, the chain of events I traced in the hyperscaler conflict piece. Asking Anthropic to guarantee that no one will ever do that again is asking it to certify the absence of an attack nobody has invented yet. You cannot prove that negative. No one can.

The Category Error Underneath the Order

Good security regulation mandates process, not perfection. It asks for red-teaming, defense in depth, monitoring, disclosure, and a fast patch cycle. It does not ask for zero successful attacks, because every serious security professional knows zero is not on the menu.

A zero-jailbreak standard is the safety equivalent of demanding software with zero vulnerabilities. We already have a clear view of how that thinking fails. TF's own verdict on AI-discovered CVEs lands on managing a flood of imperfect findings with human gates and reproduction, not on pretending a clean bill is achievable. The bar the White House set is unfalsifiable in the direction that would let the model ship, and trivially falsifiable in the direction that keeps it dark. One clever prompt, and Anthropic has failed a test that has no passing grade.

Two Resolutions, Both Bad

Read the condition literally and it is a permanent ban dressed as a safety requirement. No model meets it, so no model comes back, and the precedent quietly threatens frontier deployment for every US lab the moment a regulator invokes the same standard.

Soften it in private, which is the more likely path, and you get security theater plus something worse: an unwritten standard. If jailbreak-proof actually means hard enough to jailbreak that we are comfortable, then the real bar is whatever a phone call decides in a given week. That is not a regulation. It is discretion with a security label on it, and discretion is the thing export controls were supposed to replace with rules.

The Hole the Mandate Cannot Reach

While Washington negotiates the impossible with the one lab that answered the phone, the capability it fears is already loose. Two days after the Fable pull, Zhipu shipped GLM-5.2 as an open-weight frontier trained on Huawei silicon, with MIT-licensed weights landing this week, which I covered in the open frontier export letter piece. You cannot make open weights jailbreak-proof, and you cannot recall them. Ars Technica put the broader point plainly this week: dangerous models are coming no matter what, because offensive capability is becoming a default feature of the frontier, not a bug to be patched out of one vendor.

So the mandate governs exactly the models that comply and is blind to the models that do not. It raises the bar for the lab disputing the order in public, and lowers nothing for the lab that published its weights and walked away. The threat model is inverted.

Our Take

The danger here is not that Fable 5 can be jailbroken. Every model can be jailbroken, and a government that did not know that should not be setting the terms. The danger is a regulator certifying safety as perfection, because perfection forces the regulated party to either lie or stop shipping, and both outcomes are worse than honest residual-risk management.

If the White House wants safer deployed models, the lever exists and it is boring: mandate the process, fund the red teams, require disclosure and fast patching, and accept that the residual risk is not zero because it never is. What it should not do is hang reinstatement on a guarantee that the field's best researchers, including the one Anthropic just sent into the room, will tell them flatly cannot be given. We track how the frontier models actually stack up on capability and access on our model wars page, and the number I am watching is not a jailbreak rate. It is a date. Anthropic heads back to Washington on June 22, and the question is whether anyone in the room has revised the ask from impossible to merely hard.

Back to Originals Back to Feed