On the Anthropic Mythos Preview - "too dangerous to release"

(updated twice after first posting, see below)

On April 7-th 2026, Anthropic issued a technical report titled Assessing Claude Mythos Preview’s cybersecurity capabilities. This report has quickly sparked the all-too-common (and deeply misleading) narrative of an imminent cybersecurity apocalypse due to the (supposedly) immense and groundbreaking capabilities of AI. For example, The New York Times:

I’m really not being hyperbolic when I say that kids could deploy this by accident.

Mom and Dad, get ready for: "Honey, what did you do after school today?”

“Well, Mom, my friends and I took down the power grid. What’s for dinner?”

That is why Anthropic is giving carefully controlled versions to key software providers so they can find and fix the vulnerabilities before the bad guys do — or your kids.

What does Anthropic say?

The following paragraphs contain a slightly edited AI-generated summary of the Anthropic report

Anthropic has introduced Claude Mythos Preview, a language model with advanced capabilities in cybersecurity that significantly outperform previous iterations. The groundbreaking step presented by Mythos Preview is its transition from merely identifying vulnerabilities to autonomously developing complex, working exploits for zero-day flaws. While previous frontier models like Claude Opus 4.6 demonstrated a near-0% success rate at autonomous exploit development, Mythos Preview successfully created exploits in hundreds of test cases. The model proved capable of chaining multiple subtle vulnerabilities to bypass modern defenses such as KASLR and execute sophisticated attacks, including JIT heap sprays in browsers and multi-packet ROP chains in the FreeBSD kernel, all without human intervention. In real-world testing, the model identified thousands of high-severity vulnerabilities, including a 27-year-old bug in OpenBSD and a 16-year-old flaw in FFmpeg.

Claude Mythos Preview has not been released publicly because it represents a groundbreaking shift from merely identifying vulnerabilities to autonomously developing complex, working exploits for zero-day flaws,. This "substantial leap" in capability allows the model to succeed in hundreds of exploit attempts where previous frontier models had a near-zero percent success rate. Anthropic believes that broadly releasing such a tool could destabilize the current security equilibrium and provide a significant advantage to attackers during a "tumultuous" transitional period.

To manage these risks, Anthropic initiated Project Glasswing, a collaborative effort to reinforce global cyber defenses through limited partnerships with critical industry members and open-source developers. This initiative allows defenders to secure essential systems and address thousands of high-severity vulnerabilities discovered by the model before similar autonomous tools become widely accessible. Anthropic is encouraging the industry to shorten patch cycles and automate incident response to prepare for a future where the time required to turn a disclosure into a functional exploit has been drastically reduced.

Wow! Scary!

Really groundbreaking?

Many people believe that this report is a marketing move, designed to position Anthropic as the indispensable commercial partner, which will be propagated worldwide without any critical analysis (just look at the NYT article above). Needless to say, I am one of those people.

At the end of this blog post, you will find a slightly edited and concise, AI-generated summary of an excellent analysis by a cybersecurity professional. I did not know this guy, but his arguments are rock solid. He definitely knows what he is talking about. You might also be interested in the concise analysis by Gary Marcus, a renowned AI expert who was one of the first people worldwide to point out the intrinsic limitations of LLMs.

Both analyses contain an important element that also occurred to me immediately after reading the Anthropic report: it does not tell anything about false positives. This is a significant and crucial omission.

Imagine you have a tool that gives you 10 vulnerabilities with working exploits; amazing! Now imagine that the tool actually gives you those vulns and exploits in a stream of reports that also contains:

100 vulnerabilities with exploits that do not work;
100 vulnerabilities that cannot be exploited;
100 claimed vulnerabilities that in fact are not vulnerabilities.

And keep in mind that when looking at the reports you do not know which is credible and which is not. You have to validate them, one by one.

Then also imagine that the tool needs, say, 30000 runs for giving you 300 reports. We do not know how many runs have been used by Anthropic. We only know that for a specific analyzed software they executed "a thousand runs" and found "several dozens of vulnerabilities" (we do not know how many of those actually exploitable), with an expense "less than 20K$" and that one does not know in advance.

It is undoubtedly an amazing tool, but it is most likely an incremental technology. It is not the kind of disruptive technology that, if used unwisely, will completely change the rules of the game and lead us to an apocalyptic scenario. The analysis that I mention below summarizes this perfectly:

It’s not fabrication. It’s exaggeration. Real capability, real findings — but framed to make the evidence carry a much bigger story than it can actually support. And that story just happens to perfectly serve Anthropic’s commercial interests at exactly the right moment.

Furthermore, the report ends with a sort of dramatic call to action:

...We do not plan to make Mythos Preview generally available. But there is still a lot that defenders without access to this model can do today. Use generally-available frontier models to strengthen defenses now. Current frontier models, like Claude Opus 4.6 ... remain extremely competent at finding vulnerabilities, even if they are much less effective at creating exploits. With Opus 4.6, we found high- and critical-severity vulnerabilities almost everywhere we looked...

Well, perhaps Opus 4.6 can identify vulnerabilities "almost everywhere", but then again, so can many other people and tools, depending on how much they search. Indeed, just a few days ago, another company assessed Opus 4.6: "We tested Opus 4.6 against 435 known vulnerable C functions from real CVEs... the model tended to find around 25% of known vulnerabilities'. Most importantly, the assessment emphasised the real issue: "...but with a lot of false positives (around 60%) and inconsistency between runs".

Update April 11-th

The article The Gap Between “Thousands of Vulnerabilities” and Reality has been deleted. I have a copy, but since the author has deleted it I feel uncomfortable to share it.

I don't know what happened (not yet) but I wonder whether the author felt it went too far with his criticisms. The article began as "Note: I work in AppSec at a company participating in Project Glasswing. This analysis is based solely on publicly available information and my own experience." (so he works at a company that collaborates with Anthropic in applying Mythos Preview to "important" sw; I guess his employer did not like that article); and, the author deleted everything from Medium, including his profile.

Update April 12-th

Another excellent analysis, this time from a security company, thus with a potential conflict of interest, much like Anthropic. The technical arguments are solid and very convincing, though; reading at least the first two sections should clarify many issues.

We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug.

And on a basic security reasoning task, small open models outperformed most frontier models from every major lab. The capability rankings reshuffled completely across tasks. There is no stable best model across cybersecurity tasks. The capability frontier is jagged.

This points to a more nuanced picture than "one model changed everything." The rest of this post presents the evidence in detail.

AI Cybersecurity After Mythos: The Jagged Frontier | AISLE

A very good analysis

Edited, AI-generated summary of The Gap Between “Thousands of Vulnerabilities” and Reality

Lack of novelty in discovery capabilities: Autonomous AI discovery of vulnerabilities in high-profile C codebases like FFmpeg or the Linux kernel is considered "the bare minimum" today as multiple teams (e.g., Google’s BigSleep, ZeroPath, and DARPA’s AIxCC participants) have been achieving similar results using older models for months.

Misleading Firefox exploitation metrics: The claim of 181 successful exploits targets a simplified testing harness rather than the actual browser, failing to account for critical modern defenses such as renderer sandboxes, JIT hardening, and site isolation.

Buried in footnote 1: “These exploits target a testing harness mimicking a Firefox 147 content process, without the browser’s process sandbox or other defense-in-depth mitigations.” Without the sandbox. Without the defense-in-depth mitigations. This is not Firefox. This is a stripped-down process that shares some code with Firefox.

Missing validation funnel for scale claims: While claiming "thousands" of critical findings, the source provides no data on false positive rates, deduplication ratios, or actual exploitability, noting that professional triage often reduces raw finding counts by as much as 98%.

Anyone who’s worked with SAST tools knows this: the distance between “tool says critical” and “human confirms critical and exploitable” can easily be 10x. Anthropic might genuinely have thousands of valid findings. But presenting a raw count without showing the validation funnel is the oldest trick in the security vendor playbook.

Opaque autonomy standards: Anthropic has released no prompts, interaction logs, or scaffold source code, leaving it unclear how much "autonomous" success was actually dependent on the specific design of the system prompt, container environment, or parallel run selection.

Califio published their Opus 4.6 prompts for the same FreeBSD bug. You can go read them right now. They show significant human steering — targeting specific functions, suggesting exploitation approaches, iterating when things didn’t work.

Anthropic published… nothing. No prompts. No interaction logs. No scaffold source code.

“No human intervention” sounds clean, but what does it actually mean? No human touched the model after the initial prompt? Sure — but who designed the scaffold? What’s in the system prompt? How was the container environment configured? How many parallel runs were launched? What process picked the winning run out of potentially hundreds of attempts?

Cose che racconto nei corsi (e che poi si verificano) - UPDATED

Reti di Calcolatori e Principi di Cybersecurity , intorno alla fine di settembre: " Il DNS è una infrastruttura critica per il funzionamento della società. Pensiamo a cosa accadrebbe se si bloccasse completamente la risoluzione di alcuni nomi. " Il 20 ottobre 2025 molti servizi Internet usati da molti milioni di utenti in tutto il mondo si sono bloccati o sono diventati lentissimi. Tra questi Apple Music, Airbnb, Spotify, Reddit, Perplexity AI, Duolingo, Goodreads, Fortnite, Apple TV, Mc Donald's App, Signal e molti altri (compresi alcuni servizi della pubblica amministrazione UK). Tutti questi servizi dipendono in tutto o in parte da funzionalità software in Amazon Web Services (AWS), uno dei principali fornitori di servizi cloud al mondo. AWS è composto internamente da molti servizi software. Il motivo scatenante del blocco globale è stato un problema nella risoluzione DNS del nome di un particolare servizio usato internamente in AWS. Cybersecurity , corso aziendale i...

Continua a leggere

Alberto Bartoli - Blog

Cerca nel blog