Claude Opus 4.8 scores over 1% on ARC-AGI 3 !!

Reddit r/singularity 06/01/26, 07:14 PM Models

Summary

Claude Opus 4.8 achieves a score of over 1% on the ARC-AGI 3 benchmark, demonstrating slight progress on a difficult AI reasoning test.

No content available

Original Article

Similar Articles

Claude Opus 4.8 says it's the only model that finished every case on the Super-Agent benchmark. Anyone run it on real agents yet?

Reddit r/AI_Agents

Anthropic released Claude Opus 4.8, claiming it is the only model to complete every case on the Super-Agent benchmark and that it outperforms GPT-5.5 on browser/computer use tasks with better tool efficiency and fewer uncorrected code flaws.

Opus 4.8 just broke ARC-AGI-3 (1 minute read)

TLDR AI

A new benchmark called LisanBench evaluates LLMs on word chain tasks requiring planning, memory, and constraint adherence, with results showing strong performance from o3 and Anthropic models.

Introducing Claude Opus 4.7

Anthropic News

Anthropic has released Claude Opus 4.7, a new AI model featuring significant improvements in advanced software engineering, vision capabilities, and self-verification. The release includes specific cybersecurity safeguards and is available via API and major cloud providers.

@orca_build: Anthropic’s new Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1… …but it’s noticeably better at UI tasks.…

X AI KOLs Timeline

Anthropic's Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1 but excels at UI tasks; Orca's orchestration enables Codex to delegate UI tasks to Claude Code.

@mfpiccolo: Opus 4.8 is out. Here is the the verdict from @iiidevs lead engineer: did a stress test it’s just another llm can’t rea…