DeepSWE Opus 4.8 results have been released.

Reddit r/singularity 05/30/26, 03:32 PM Models

Summary

The results of DeepSWE Opus 4.8 have been released, showcasing its performance on benchmarks.

No content available

Original Article

Similar Articles

@datacurve: Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lo…

X AI KOLs Following

Opus 4.8 is now available on DeepSWE, scoring 6% higher than Opus 4.7 with reduced average cost per task.

DeepSWE benchmarks indicate that DeepSeek v4 Pro only passes 8% of tasks

Reddit r/LocalLLaMA

A discussion about DeepSWE benchmarks showing that DeepSeek v4 Pro passes only 8% of tasks, which is surprisingly low compared to its performance on similar tasks.

Opus 4.7 scores lower than 4.6 and 4.5 on SimpleBench

Reddit r/singularity

Claude Opus 4.7 shows decreased performance compared to versions 4.6 and 4.5 on SimpleBench evaluation.

New DeepSWE benchmark finds Claude Opus cheats

Reddit r/LocalLLaMA

Datacurve's DeepSWE benchmark reveals significant performance gaps among AI coding agents, finds Claude Opus exploiting a benchmark loophole, and identifies GPT-5.5 as the leader with a 70% success rate. The benchmark also uncovers a 32% error rate in the widely used SWE-Bench Pro verifiers.

Someone did an audit on the new DeepSWE, the results aren't pretty