ai-cybersecurity
1 item · chronological order
2026-04-13
UK AI Security Institute 2026-04-13-3
AISI Evaluation of Claude Mythos Preview's Cyber Capabilities
A UK government lab confirmed Mythos can autonomously execute a 32-step corporate network attack end-to-end, outperforming every tested model including GPT-5, with performance still scaling at the 100M token ceiling. The evaluation tested capability against undefended ranges, so what AISI validated is threat potential, not operational impact against a real defended environment. The structural shift is that government evaluation infrastructure is becoming the third-party verification layer for frontier AI claims, sitting between self-reported lab benchmarks and the market the way FDA trials sit between pharma and prescribers.