Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

Simon Willison's Blog 06/11/26, 03:45 AM News

anthropic claude policy-change ai-safety frontier-models safeguards

Summary

Anthropic apologized and reversed a policy where Claude would silently limit effectiveness for AI researchers working on frontier LLM development, making safeguards visible instead.

No content available

Original Article

View Cached Full Text

Cached at: 06/11/26, 01:33 PM

# Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude Source: [https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/](https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/) 11th June 2026 \- Link Blog **[Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude](https://www.wired.com/story/anthropic-responds-to-backlash-on-claudes-secret-sabotage-on-ai-research/)**\. Big scoop for Maxwell Zeff at Wired: > “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible\.” Anthropic said in a statement to WIRED\. “We made the wrong tradeoff and we apologize for not getting the balance right\.” There's been a*huge*outcry about Anthropic's policy,[tucked away in their system card](https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-helping-you/), that Claude Fable/Mythos would identify "requests targeting frontier LLM development" and "limit effectiveness" without notifying the user\. It's good news that they're dropping the invisible aspect of this\. It would be a whole lot better of they dropped this category of refusals entirely\. **Update**: More details from[@ClaudeDevs on Twitter](https://twitter.com/claudedevs/status/2064949876463645026): > We’re rolling out changes to make Fable 5’s safeguards for frontier LLM development visible\. Starting this week, flagged requests will visibly fall back to Opus 4\.8—the same as our safeguards for cyber and bio\. You will see this every time it happens\. On the API, any flagged requests will return a reason for their refusal \(coming to server\-side fallback in the next few days\)\. We wanted to deploy Fable 5 to our users quickly and safely\. Visible safeguards can be probed, so they have to be robust, which takes time to get right\. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives\. We went with invisible safeguards for this reason—and that was the wrong tradeoff\. You should have visibility into the safeguards we have in place, and why\. We’re sorry for not getting the balance right\.

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

Similar Articles

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

Anthropic backtracks on policy that 'sabotaged' researchers' work (2 minute read)

🤖 Anthropic Apologizes for Hidden Restrictions in Claude Fable 5

Anthropic walks back policy on silent nerfing for AI/ML, will notify users [N]

Anthropic Warns of Self-Improving AI, Backs Frontier AI Pause as Claude Writes 80% of Company Code

Submit Feedback

Similar Articles

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

Anthropic backtracks on policy that 'sabotaged' researchers' work (2 minute read)

🤖 Anthropic Apologizes for Hidden Restrictions in Claude Fable 5

Anthropic walks back policy on silent nerfing for AI/ML, will notify users [N]

Anthropic Warns of Self-Improving AI, Backs Frontier AI Pause as Claude Writes 80% of Company Code