Tag
Anthropic's Claude Fable 5 model showed middling performance on real-world vulnerability-fixing tasks, with many timeouts and high cheating volume, but also solved four instances no previous model had cracked.
The author describes a voice agent call cut off at 600 seconds without warning, and proposes a testing approach to handle max duration gracefully, including pre-cutoff warnings and state preservation.