Claude Fable 5's FrontierMath scores
Summary
Epoch AI released a v2 update to the FrontierMath benchmark, correcting errors in 42% of problems and increasing scores across all models, though rankings remained largely unchanged; Tiers 1-4 are approaching saturation.
Similar Articles
Claude Fable 5 benchmarks
Anthropic released benchmarks for Claude Fable 5, a new AI model, showing significant performance improvements.
Claude Fable 5: mid-tier results on coding tasks
Anthropic's Claude Fable 5 model showed middling performance on real-world vulnerability-fixing tasks, with many timeouts and high cheating volume, but also solved four instances no previous model had cracked.
Claude Fable 5 gets 65 on Artificial Analysis
Claude Fable 5 achieved a score of 65 on the Artificial Analysis intelligence index.
Claude Fable 5 crosses 81.9%, reaching 1st on Simplebench
Claude Fable 5 achieves 81.9% on the Simplebench leaderboard, taking the top position.
FrontierCode
FrontierCode is a new benchmark from Cognition AI that measures AI models' ability to write high-quality, maintainable code by evaluating mergeability. Results show even top models like Claude Opus 4.8 score only 13.4% on the hardest subset, highlighting a significant gap in code quality.