@datacurve: Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lo…

X AI KOLs Following 05/30/26, 09:21 PM Models

Summary

Opus 4.8 is now available on DeepSWE, scoring 6% higher than Opus 4.7 with reduced average cost per task.

Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task. https://t.co/HGLWsmDxZu

Original Article

View Cached Full Text

Cached at: 05/31/26, 04:53 PM

Opus 4.8 is now on DeepSWE.

On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task. https://t.co/HGLWsmDxZu

Similar Articles

DeepSWE Opus 4.8 results have been released.

Reddit r/singularity

The results of DeepSWE Opus 4.8 have been released, showcasing its performance on benchmarks.

@danshipper: vibe check: Opus 4.7 feels like it's gotten a lot better recently. Both at coding and writing / strategy / deep thinkin…

X AI KOLs Following

Users report noticeable improvements in Opus 4.7's performance for coding, writing, and strategic reasoning tasks.

Differences Between Opus 4.7 and Opus 4.8 on MineBench

Reddit r/singularity

Opus 4.8 shows improved build quality and lower cost compared to Opus 4.7 on the MineBench 3D block-structure benchmark, though with some inconsistencies. The model demonstrates streamlined thinking and more efficient inference.

Opus 4.7 scores lower than 4.6 and 4.5 on SimpleBench

Reddit r/singularity

Claude Opus 4.7 shows decreased performance compared to versions 4.6 and 4.5 on SimpleBench evaluation.

@orca_build: Anthropic’s new Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1… …but it’s noticeably better at UI tasks.…

X AI KOLs Timeline

Anthropic's Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1 but excels at UI tasks; Orca's orchestration enables Codex to delegate UI tasks to Claude Code.

Similar Articles

DeepSWE Opus 4.8 results have been released.

@danshipper: vibe check: Opus 4.7 feels like it's gotten a lot better recently. Both at coding and writing / strategy / deep thinkin…

Differences Between Opus 4.7 and Opus 4.8 on MineBench

Opus 4.7 scores lower than 4.6 and 4.5 on SimpleBench

@orca_build: Anthropic’s new Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1… …but it’s noticeably better at UI tasks.…

Submit Feedback