New Mythos checkpoint shows continued improvement: “On a 32-step corporate network attack we estimate takes a human expert ~20 hours, this checkpoint completes the full attack in 6 /10 attempts.”
Summary
Mythos releases a new checkpoint that can complete a 32-step corporate network attack in 6 out of 10 attempts, compared to ~20 hours for a human expert.
Similar Articles
More evidence of Mythos's strength in Cybersecurity/Hacking - compared to 5.5, it got 18/41 n-day exploits, vs 1/41. Open Source/Weights models get nothing
Mythos demonstrates strong performance in cybersecurity hacking, achieving 18 out of 41 n-day exploits compared to 1 for version 5.5, while open-source models get none.
Mythos can improve speed of training code 52x (compared to human 4x at 4-8hrs)
Anthropic's Mythos system achieved a 52x speedup in optimizing training code compared to a human's 4x speedup over 4-8 hours on the same task, with the caveat that absolute multiples depend heavily on starting code quality. The like-for-like comparison shows roughly 3x–52x improvement across models over the past year.
Cloudflare Warns Mythos AI Can Build Real Cyberattacks Ahead of AI Giant's G20 Briefing
Cloudflare's testing of Anthropic's Mythos Preview reveals the model can chain multiple low-severity bugs into working exploits, a major step in offensive cybersecurity AI, as Anthropic prepares to brief G20 officials on related risks.
@logangraham: A lot of people have been wondering about Mythos, Glasswing, and the vulns we / our partners are fixing. Today, I’m exc…
Anthropic's Claude Mythos Preview model has been evaluated by XBOW and UK AISI, showing unprecedented autonomous cybersecurity capabilities, including solving end-to-end cyber ranges and finding thousands of vulnerabilities. The announcement emphasizes the need to prepare for rapidly advancing AI capabilities in cybersecurity.
Project Glasswing: what Mythos showed us
Cloudflare tested Anthropic's Mythos Preview LLM, designed for security vulnerability research, and found it capable of chaining multiple bugs into exploits and generating working proofs, representing a significant advancement over general-purpose frontier models.