Claude Fable 5's FrontierMath scores

Reddit r/singularity News

Summary

Epoch AI released a v2 update to the FrontierMath benchmark, correcting errors in 42% of problems and increasing scores across all models, though rankings remained largely unchanged; Tiers 1-4 are approaching saturation.

Source: [https://epoch.ai/frontiermath/tiers-1-4](https://epoch.ai/frontiermath/tiers-1-4) The improvements in the Tier 1–3 and Tier 4 scores are attributable to the benchmark [v2 update](https://x.com/EpochAIResearch/status/2065488154086568445), which corrected errors in 42% of the problems. While rankings remained largely unchanged, scores increased across the board. Epoch has stated Tiers 1-4 and now approaching saturation.
Original Article

Similar Articles

Claude Fable 5 benchmarks

Reddit r/singularity

Anthropic released benchmarks for Claude Fable 5, a new AI model, showing significant performance improvements.

Claude Fable 5: mid-tier results on coding tasks

Hacker News Top

Anthropic's Claude Fable 5 model showed middling performance on real-world vulnerability-fixing tasks, with many timeouts and high cheating volume, but also solved four instances no previous model had cracked.

FrontierCode

Hacker News Top

FrontierCode is a new benchmark from Cognition AI that measures AI models' ability to write high-quality, maintainable code by evaluating mergeability. Results show even top models like Claude Opus 4.8 score only 13.4% on the hardest subset, highlighting a significant gap in code quality.