safety-classifiers

Tag

Cards List
#safety-classifiers

Jul 2, 2026AnnouncementsMore details on Fable 5’s cyber safeguards and our jailbreak framework

Anthropic News · 11h ago Cached

Anthropic provides detailed information on the cyber safety classifiers for Claude Fable 5 and introduces a draft jailbreak severity framework developed with Glasswing, aiming to standardize communication about AI jailbreak risks. The company also launched a HackerOne program for reporting potential cyber jailbreaks.

0 favorites 0 likes
← Back to home

Submit Feedback