Opus 4.7 scores lower than 4.6 and 4.5 on SimpleBench

Reddit r/singularity 04/22/26, 01:03 PM Models

model-evaluation performance-regression benchmarking

Summary

Claude Opus 4.7 shows decreased performance compared to versions 4.6 and 4.5 on SimpleBench evaluation.

No content available

Original Article

Similar Articles

Differences Between Opus 4.7 and Opus 4.8 on MineBench

Reddit r/singularity

Opus 4.8 shows improved build quality and lower cost compared to Opus 4.7 on the MineBench 3D block-structure benchmark, though with some inconsistencies. The model demonstrates streamlined thinking and more efficient inference.

@orca_build: Anthropic’s new Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1… …but it’s noticeably better at UI tasks.…

X AI KOLs Timeline

Anthropic's Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1 but excels at UI tasks; Orca's orchestration enables Codex to delegate UI tasks to Claude Code.

Opus 4.7 scores lower than 4.6 and 4.5 on SimpleBench

Similar Articles

Differences Between Opus 4.7 and Opus 4.8 on MineBench

@orca_build: Anthropic’s new Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1… …but it’s noticeably better at UI tasks.…

Differences Between Claude Opus 4.8 and Claude Fable 5 on MineBench

ProgramBench result for Fable 5 is in, doubling Opus 4.8 even with 4.8 fallback "99% of the runs"

DeepSWE Opus 4.8 results have been released.

Submit Feedback