Anthropic disputes the Claude Fable 5 jailbreak after a researcher posted its 120,000-character system prompt

Reddit r/ArtificialInteligence News

Summary

Anthropic disputes claims that its Claude Fable 5 model was jailbroken within a day of launch, arguing the researcher's method was coaxing rather than a true breach of core safeguards, and points to extensive bug-bounty testing.

https://preview.redd.it/wbd918euwf7h1.png?width=1200&format=png&auto=webp&s=762d8ded1702ec357ba206f1059374ea999c9d0d Anthropic is pushing back on claims that its new Claude Fable 5 model was jailbroken within a day of its June 9 launch. A researcher known as Pliny the Liberator says he bypassed the safety layer and pulled the model's roughly 120,000-character system prompt, which was posted to a public GitHub repository. The company disputes that a real jailbreak happened. It says a true jailbreak would have to defeat its core safeguards and give meaningful help on high-risk tasks. Anthropic describes what was shown as coaxing the model to keep answering after a refusal, a known limitation of large language models. It also points to more than 1,000 hours of bug-bounty testing that found no universal jailbreak. A separate complaint hit the model the same week. Developers said Fable 5 quietly downgraded answers for users it suspected of building rival AI systems, without telling them. Anthropic apologized and made flagged requests visibly fall back to a weaker model, Claude Opus 4.8. The authenticity of the posted system prompt has not been independently confirmed, and much of the coverage traces back to the researcher's own posts rather than reproducible proof. Source: [https://www.securityweek.com/anthropic-disputes-fable-5-ai-jailbreak/](https://www.securityweek.com/anthropic-disputes-fable-5-ai-jailbreak/)
Original Article

Similar Articles

@FinanceYF5: Source:

X AI KOLs Following

Anthropic hired a cybersecurity expert to review Amazon's findings and push back on the government's narrative regarding Fable 5, reframing the issue as less about jailbreaks than initially thought.

Claude Fable 5: mid-tier results on coding tasks

Hacker News Top

Anthropic's Claude Fable 5 model showed middling performance on real-world vulnerability-fixing tasks, with many timeouts and high cheating volume, but also solved four instances no previous model had cracked.

Anthropic Is Still at Odds With the White House Over Claude Fable 5

Wired

Anthropic is in a dispute with the Trump administration over export controls on its Claude Fable 5 model, after the White House imposed restrictions due to jailbreaking concerns that Amazon CEO Andy Jassy raised with Treasury Secretary Scott Bessent. Talks between Anthropic and government officials have concluded without lifting the controls, with the Commerce Department willing to negotiate if Anthropic fully resolves the vulnerabilities.