Opus 4.7 scores lower than 4.6 and 4.5 on SimpleBench
Summary
Claude Opus 4.7 shows decreased performance compared to versions 4.6 and 4.5 on SimpleBench evaluation.
Similar Articles
Differences Between Opus 4.7 and Opus 4.8 on MineBench
Opus 4.8 shows improved build quality and lower cost compared to Opus 4.7 on the MineBench 3D block-structure benchmark, though with some inconsistencies. The model demonstrates streamlined thinking and more efficient inference.
@orca_build: Anthropic’s new Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1… …but it’s noticeably better at UI tasks.…
Anthropic's Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1 but excels at UI tasks; Orca's orchestration enables Codex to delegate UI tasks to Claude Code.
Differences Between Claude Opus 4.8 and Claude Fable 5 on MineBench
A detailed comparison of Claude Opus 4.8 and Claude Fable 5 on the MineBench benchmark, highlighting trade-offs in inference time, cost, build quality, and prompting sensitivity.
ProgramBench result for Fable 5 is in, doubling Opus 4.8 even with 4.8 fallback "99% of the runs"
ProgramBench results show Fable 5 achieving double the performance of Opus 4.8, even with fallback to 4.8 in 99% of runs.
DeepSWE Opus 4.8 results have been released.
The results of DeepSWE Opus 4.8 have been released, showcasing its performance on benchmarks.