More evidence of Mythos's strength in Cybersecurity/Hacking - compared to 5.5, it got 18/41 n-day exploits, vs 1/41. Open Source/Weights models get nothing

Reddit r/singularity Models

Summary

Mythos demonstrates strong performance in cybersecurity hacking, achieving 18 out of 41 n-day exploits compared to 1 for version 5.5, while open-source models get none.

https://x.com/i/status/2055314585058693601
Original Article

Similar Articles

Cloudflare just published what they found after running Anthropic's Mythos Preview against 50+ of their own repos and the results are worth reading

Reddit r/artificial

Cloudflare shares their experience with Anthropic's Mythos Preview model, which autonomously discovered high-severity vulnerabilities across major OS and web browsers. The model demonstrates senior-level reasoning in chaining exploit primitives but has inconsistent guardrails, highlighting the need for hardened safeguards before public release.