ProgramBench result for Fable 5 is in, doubling Opus 4.8 even with 4.8 fallback "99% of the runs"

Reddit r/singularity 06/16/26, 01:54 PM News

benchmark ai-models comparison results fable-5 opus-4-8 programbench

Summary

ProgramBench results show Fable 5 achieving double the performance of Opus 4.8, even with fallback to 4.8 in 99% of runs.

https://x.com/ValsAI/status/2066760552156971291 Quite interesting result, ProgramBench creator seem to imply that there is a difference between Fable 5 falling back to 4.8 quickly vs 4.8 even across tasks that consume most tokens from 4.8 Why is 4.8 in a Fable 5 quick handoff using 2x more tokens than 4.8?

Original Article

ProgramBench result for Fable 5 is in, doubling Opus 4.8 even with 4.8 fallback "99% of the runs"

Similar Articles

Fable 5 benchmark with remotion video

Minebench Trains 5.2->5.5 and Opus 4.6->Fable 5

Opus 4.7 scores lower than 4.6 and 4.5 on SimpleBench

On a difficult new SWE benchmark, ProgramBench, GPT5.5 high/xhigh solves a task for first time, significantly outperforms Opus 4.7

@orca_build: Anthropic’s new Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1… …but it’s noticeably better at UI tasks.…

Submit Feedback

Similar Articles

Fable 5 benchmark with remotion video

Minebench Trains 5.2->5.5 and Opus 4.6->Fable 5

Opus 4.7 scores lower than 4.6 and 4.5 on SimpleBench
Claude Opus 4.7 shows decreased performance compared to versions 4.6 and 4.5 on SimpleBench evaluation.

On a difficult new SWE benchmark, ProgramBench, GPT5.5 high/xhigh solves a task for first time, significantly outperforms Opus 4.7
GPT5.5 achieved the first solve on the difficult ProgramBench SWE benchmark, significantly outperforming Opus 4.7.

@orca_build: Anthropic’s new Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1… …but it’s noticeably better at UI tasks.…