Small, Cheap AI Models Match Mythos on Flagship Vulnerability Finds

Security firm AISLE ran Anthropic’s showcase Mythos vulnerabilities through small, open-weights models and found they recovered much of the same analysis. All eight models tested detected the flagship FreeBSD exploit, including a 3.6B-parameter model costing $0.11 per million tokens. A 5.1B-active model reproduced the core chain behind the 27-year-old OpenBSD bug. On a basic security reasoning benchmark, small open models outperformed most frontier models from every major lab.

AISLE argues that AI cybersecurity capability is “jagged” - it doesn’t scale smoothly with model size or price, and the rankings reshuffle completely across different task types. There is no stable best model. The real pipeline decomposes into broad-spectrum scanning, vulnerability detection, triage, patch generation, and exploit construction, each with different scaling properties. Their own system has produced over 180 validated CVEs across 30+ projects using multiple model families, with Anthropic’s models not consistently on top.

The central thesis: the moat in AI cybersecurity is the orchestration system - targeting, iterative deepening, validation, triage, and maintainer trust - not any single model. Cheap models deployed broadly with expert scaffolding can outperform one expensive model that has to guess where to look. Mythos validates that the category is real, but the open question remains how to make it work at scale with maintainer trust, and that problem isn’t model-bound.