ProgramBench result for Fable 5 is in, doubling Opus 4.8 even with 4.8 fallback "99% of the runs"

Reddit r/singularity News

Summary

ProgramBench results show Fable 5 achieving double the performance of Opus 4.8, even with fallback to 4.8 in 99% of runs.

https://x.com/ValsAI/status/2066760552156971291 Quite interesting result, ProgramBench creator seem to imply that there is a difference between Fable 5 falling back to 4.8 quickly vs 4.8 even across tasks that consume most tokens from 4.8 Why is 4.8 in a Fable 5 quick handoff using 2x more tokens than 4.8?
Original Article

Similar Articles

Fable 5 benchmark with remotion video

Reddit r/singularity

Fable 5 shows overall improvement over Opus 4.8 in video generation benchmarks, but Gemini 3.1 Pro demonstrates more artistic vision despite issues with tool calls and buggy code.

Minebench Trains 5.2->5.5 and Opus 4.6->Fable 5

Reddit r/ArtificialInteligence

A comparison of various GPT and Claude Opus model versions on the Minebench (Minecraft) benchmark, with detailed judgments between GPT-5.5 and Fable 5 on specific builds.