Tag
Discusses the evolving safety review process for frontier AI models, referencing Claude Fable 5's re-release and the need for a shared industry framework to assess jailbreaks, while expressing cautious optimism about the balance between safety and innovation.
Anthropic provides detailed information on the cyber safety classifiers for Claude Fable 5 and introduces a draft jailbreak severity framework developed with Glasswing, aiming to standardize communication about AI jailbreak risks. The company also launched a HackerOne program for reporting potential cyber jailbreaks.