Cloudflare puts Anthropic's Mythos Preview to work hunting bugs in its own code
Cloudflare tested Anthropic’s security-focused Mythos Preview model against more than fifty of its own repositories under Project Glasswing, and reports a meaningful capability jump over general-purpose frontier models. The standout behaviors are exploit chain construction — stitching small primitives like use-after-free bugs into ROP-driven control flow hijacks — and self-contained proof generation, where the model writes, compiles, and runs trigger code in a scratch environment, then iterates when its hypothesis fails. Other frontier models found many of the same underlying bugs but stalled at the assembly step, leaving exploitability unproven.
The preview build shipped without the safeguards present in generally available models, yet exhibited emergent refusals on legitimate research tasks. Those refusals proved inconsistent: identical requests produced different answers across runs or after unrelated environmental changes, which Cloudflare argues is exactly why any future generally available cyber-capable frontier model needs deliberate safety layers on top of this baseline.
On signal-to-noise, Cloudflare frames two persistent drivers of false positives — memory-unsafe languages like C/C++, and a model bias toward producing findings whether bugs exist or not. Mythos Preview narrows the triage cost because findings tend to arrive with working PoCs and clearer reproduction steps. Cloudflare also pushes back on the instinct to point a generic coding agent at a repo: vulnerability research is narrow and parallel, not the single-hypothesis loop coding agents are tuned for, which is why purpose-built harnesses and architecture around the model matter as much as the model itself.
Read the full article
Continue reading at Hacker News →This is an AI-generated summary. Read the original for the full story.