METR evaluated an early version of Claude Mythos
Summary
METR evaluated an early version of Claude Mythos Preview in March 2026 using their time-horizons task suite, estimating a 50%-time-horizon of at least 16 hours, indicating the model is at the upper end of what current benchmarks can measure, with caveats about stability at longer time ranges.
Similar Articles
Apr 29, 2026ScienceEvaluating Claude’s bioinformatics research capabilities with BioMysteryBench
Anthropic researchers evaluate Claude's bioinformatics capabilities using BioMysteryBench, finding that current models perform on par with human experts and sometimes outperform them on complex biological problems.
Claude Mythos, Deepseek v4, HappyHorse, Meta’s new AI, realtime video games: AI NEWS
Anthropic unveils a withheld Claude Mythos model that autonomously finds thousands of 0-days, ZAI open-sources the 1.5 TB GLM-5.1 that tops open-weight benchmarks, Alibaba’s unreleased HappyHorse video model hits #1 on public leaderboards, and Deepseek teases an “Expert Mode” v4 preview.
Apr 30, 2026Societal ImpactsHow people ask Claude for personal guidance
Anthropic presents research on how users seek personal guidance from Claude, highlighting findings on sycophancy rates across domains. The study informed the training of Claude Opus 4.7 and Mythos Preview to better protect user wellbeing.
Hardening Firefox with Claude Mythos Preview
Mozilla details how they used Claude Mythos Preview and other AI models to identify and fix a significant number of latent security bugs in Firefox, demonstrating a shift in the efficacy of AI for code hardening.
Behind the Scenes Hardening Firefox with Claude Mythos Preview
Mozilla used the Claude Mythos preview to systematically find and fix hundreds of security vulnerabilities in Firefox, dramatically increasing their bug-fix rate from around 20-30 per month to 423 in April 2026.