ProgramBench result for Fable 5 is in, doubling Opus 4.8 even with 4.8 fallback "99% of the runs"
Summary
ProgramBench results show Fable 5 achieving double the performance of Opus 4.8, even with fallback to 4.8 in 99% of runs.
Similar Articles
Fable 5 benchmark with remotion video
Fable 5 shows overall improvement over Opus 4.8 in video generation benchmarks, but Gemini 3.1 Pro demonstrates more artistic vision despite issues with tool calls and buggy code.
Minebench Trains 5.2->5.5 and Opus 4.6->Fable 5
A comparison of various GPT and Claude Opus model versions on the Minebench (Minecraft) benchmark, with detailed judgments between GPT-5.5 and Fable 5 on specific builds.
Opus 4.7 scores lower than 4.6 and 4.5 on SimpleBench
Claude Opus 4.7 shows decreased performance compared to versions 4.6 and 4.5 on SimpleBench evaluation.
On a difficult new SWE benchmark, ProgramBench, GPT5.5 high/xhigh solves a task for first time, significantly outperforms Opus 4.7
GPT5.5 achieved the first solve on the difficult ProgramBench SWE benchmark, significantly outperforming Opus 4.7.
@orca_build: Anthropic’s new Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1… …but it’s noticeably better at UI tasks.…
Anthropic's Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1 but excels at UI tasks; Orca's orchestration enables Codex to delegate UI tasks to Claude Code.