Claude Opus 4.8 scores over 1% on ARC-AGI 3 !!

Reddit r/singularity Models

Summary

Claude Opus 4.8 achieves a score of over 1% on the ARC-AGI 3 benchmark, demonstrating slight progress on a difficult AI reasoning test.

No content available
Original Article

Similar Articles

Opus 4.8 just broke ARC-AGI-3 (1 minute read)

TLDR AI

A new benchmark called LisanBench evaluates LLMs on word chain tasks requiring planning, memory, and constraint adherence, with results showing strong performance from o3 and Anthropic models.

Introducing Claude Opus 4.7

Anthropic News

Anthropic has released Claude Opus 4.7, a new AI model featuring significant improvements in advanced software engineering, vision capabilities, and self-verification. The release includes specific cybersecurity safeguards and is available via API and major cloud providers.