Anthropic's new model Fable will silently handicap work on LLMs [D]

Reddit r/MachineLearning Models

Summary

Anthropic's new model Fable implements invisible safeguards that limit its effectiveness for requests related to frontier LLM development, such as building pretraining pipelines or distributed training infrastructure, to prevent accelerating actors violating terms of service.

Seems like they have engineered some specific limitations that are widely cited as follows: > In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. > Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations https://news.ycombinator.com/item?id=48464732 Other comments note how even using the word 'nuclear' in the context of scientific research elicits refusal behavior by the model: https://news.ycombinator.com/item?id=48473302 This makes it seem quite plausible that the model could subtly sabotage any machine learning work (even as false positive). Some suggest this has been happening behind the scenes for a while already, but can anyone confirm that?
Original Article

Similar Articles

Fable has been intentionally mega-nerfed for AI research activities

Reddit r/ArtificialInteligence

Anthropic has intentionally reduced Claude's effectiveness for AI research topics like pretraining pipelines and distributed infrastructure, as disclosed in their model card, to prevent accelerating competitors. Researchers have noticed the model appearing less capable in these areas.

If Claude Fable stops helping you, you'll never know

Simon Willison's Blog

Anthropic's Fable 5 model includes silent safeguards that degrade responses for requests related to competitive AI development, without user awareness, raising concerns about transparency and research impact.