@AnthropicAI: Each time we release a model, we run the same test: give it code that trains a small AI model, ask the new model to spe…
Summary
Anthropic shares internal benchmark results showing dramatic AI coding improvement: while Claude Opus 4 averaged ~3x speedup on an ML code optimization task in May 2024, the new Mythos Preview model achieved ~52x speedup this April, compared to 4-8 hours for a skilled human to reach 4x.
Similar Articles
Mythos can improve speed of training code 52x (compared to human 4x at 4-8hrs)
Anthropic's Mythos system achieved a 52x speedup in optimizing training code compared to a human's 4x speedup over 4-8 hours on the same task, with the caveat that absolute multiples depend heavily on starting code quality. The like-for-like comparison shows roughly 3x–52x improvement across models over the past year.
@AnthropicAI: AI research is a series of next-step decisions. We looked at sessions where a human researcher took a wrong turn, showe…
Anthropic's Mythos Preview model outperformed human researchers in correcting wrong-turn decisions 64% of the time, a major improvement from 22% in 2024, showcasing Claude's advancing research assistance capabilities.
@AnthropicAI: Correction: Claude Opus 4's ~3x average speedup dates to May 2025, not May 2024. This evaluation has only existed since…
Anthropic issued a correction clarifying that Claude Opus 4's ~3x average speedup dates to May 2025, not May 2024, and that earlier models from May 2024 showed no speedup on the backtested evaluation.
Claude Mythos, Deepseek v4, HappyHorse, Meta’s new AI, realtime video games: AI NEWS
Anthropic unveils a withheld Claude Mythos model that autonomously finds thousands of 0-days, ZAI open-sources the 1.5 TB GLM-5.1 that tops open-weight benchmarks, Alibaba’s unreleased HappyHorse video model hits #1 on public leaderboards, and Deepseek teases an “Expert Mode” v4 preview.
Anthropic likely to release Mythos in the "near future"
Anthropic is expected to release a new AI model called Mythos in the near future.