@datacurve: Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lo…
Summary
Opus 4.8 is now available on DeepSWE, scoring 6% higher than Opus 4.7 with reduced average cost per task.
View Cached Full Text
Cached at: 05/31/26, 04:53 PM
Opus 4.8 is now on DeepSWE.
On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task. https://t.co/HGLWsmDxZu
Similar Articles
DeepSWE Opus 4.8 results have been released.
The results of DeepSWE Opus 4.8 have been released, showcasing its performance on benchmarks.
@danshipper: vibe check: Opus 4.7 feels like it's gotten a lot better recently. Both at coding and writing / strategy / deep thinkin…
Users report noticeable improvements in Opus 4.7's performance for coding, writing, and strategic reasoning tasks.
Differences Between Opus 4.7 and Opus 4.8 on MineBench
Opus 4.8 shows improved build quality and lower cost compared to Opus 4.7 on the MineBench 3D block-structure benchmark, though with some inconsistencies. The model demonstrates streamlined thinking and more efficient inference.
Opus 4.7 scores lower than 4.6 and 4.5 on SimpleBench
Claude Opus 4.7 shows decreased performance compared to versions 4.6 and 4.5 on SimpleBench evaluation.
@orca_build: Anthropic’s new Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1… …but it’s noticeably better at UI tasks.…
Anthropic's Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1 but excels at UI tasks; Orca's orchestration enables Codex to delegate UI tasks to Claude Code.