Passa ai contenuti principali

On the Anthropic Mythos Preview - "too dangerous to release"

On April 7-th 2026, Anthropic issued a technical report titled Assessing Claude Mythos Preview’s cybersecurity capabilities. This report has quickly sparked the all-too-common (and deeply misleading) narrative of an imminent cybersecurity apocalypse due to the (supposedly) immense and groundbreaking capabilities of AI. For example, The New York Times

I’m really not being hyperbolic when I say that kids could deploy this by accident.

Mom and Dad, get ready for: "Honey, what did you do after school today?”

“Well, Mom, my friends and I took down the power grid. What’s for dinner?”

That is why Anthropic is giving carefully controlled versions to key software providers so they can find and fix the vulnerabilities before the bad guys do — or your kids.

What does Anthropic say?

The following paragraphs contain a slightly edited AI-generated summary of the Anthropic report

Anthropic has introduced Claude Mythos Preview, a language model with advanced capabilities in cybersecurity that significantly outperform previous iterations. The groundbreaking step presented by Mythos Preview is its transition from merely identifying vulnerabilities to autonomously developing complex, working exploits for zero-day flaws. While previous frontier models like Claude Opus 4.6 demonstrated a near-0% success rate at autonomous exploit development, Mythos Preview successfully created exploits in hundreds of test cases. The model proved capable of chaining multiple subtle vulnerabilities to bypass modern defenses such as KASLR and execute sophisticated attacks, including JIT heap sprays in browsers and multi-packet ROP chains in the FreeBSD kernel, all without human intervention. In real-world testing, the model identified thousands of high-severity vulnerabilities, including a 27-year-old bug in OpenBSD and a 16-year-old flaw in FFmpeg. 

Claude Mythos Preview has not been released publicly because it represents a groundbreaking shift from merely identifying vulnerabilities to autonomously developing complex, working exploits for zero-day flaws,. This "substantial leap" in capability allows the model to succeed in hundreds of exploit attempts where previous frontier models had a near-zero percent success rate. Anthropic believes that broadly releasing such a tool could destabilize the current security equilibrium and provide a significant advantage to attackers during a "tumultuous" transitional period.

To manage these risks, Anthropic initiated Project Glasswing, a collaborative effort to reinforce global cyber defenses through limited partnerships with critical industry members and open-source developers. This initiative allows defenders to secure essential systems and address thousands of high-severity vulnerabilities discovered by the model before similar autonomous tools become widely accessible. Anthropic is encouraging the industry to shorten patch cycles and automate incident response to prepare for a future where the time required to turn a disclosure into a functional exploit has been drastically reduced.

Wow! Scary!

Really groundbreaking?

Many people believe that this report is a marketing move, designed to position Anthropic as the indispensable commercial partner, which will be propagated worldwide without any critical analysis (just look at the NYT article above). Needless to say, I am one of those people.

At the end of this blog post, you will find a slightly edited and concise, AI-generated summary of an excellent analysis by a cybersecurity professional.  I did not know this guy, but his arguments are rock solid. He definitely knows what he is talking about. You might also be interested in the concise analysis by Gary Marcus, a renowned AI expert who was one of the first people worldwide to point out the intrinsic limitations of LLMs.

Both analyses contain an important element that also occurred to me immediately after reading the Anthropic report: it does not tell anything about false positives. This is a significant and crucial omission. 

Imagine you have a tool that gives you 10 vulnerabilities with working exploits; amazing! Now imagine that the tool actually gives you those vulns and exploits in a stream of reports that also contains:

  • 100 vulnerabilities with exploits that do not work;
  • 100 vulnerabilities that cannot be exploited;
  • 100 claimed vulnerabilities that in fact are not vulnerabilities.

And keep in mind that when looking at the reports you do not know which is credible and which is not. You have to validate them, one by one.

Then also imagine that the tool needs, say, 30000 runs for giving you 300 reports. We do not know how many runs have been used by Anthropic. We only know that for a specific analyzed software they executed "a thousand runs" and found "several dozens of vulnerabilities" (we do not know how many of those actually exploitable), with an expense "less than 20K$" and that one does not know in advance.

It is undoubtedly an amazing tool, but it is most likely an incremental technology. It is not the kind of disruptive technology that, if used unwisely, will completely change the rules of the game and lead us to an apocalyptic scenario. The analysis that I mention below summarizes this perfectly:

It’s not fabrication. It’s exaggeration. Real capability, real findings — but framed to make the evidence carry a much bigger story than it can actually support. And that story just happens to perfectly serve Anthropic’s commercial interests at exactly the right moment.

Furthermore, the report ends with a sort of dramatic call to action: 

...We do not plan to make Mythos Preview generally available. But there is still a lot that defenders without access to this model can do today. Use generally-available frontier models to strengthen defenses now. Current frontier models, like Claude Opus 4.6 ... remain extremely competent at finding vulnerabilities, even if they are much less effective at creating exploits. With Opus 4.6, we found high- and critical-severity vulnerabilities almost everywhere we looked...

Well, perhaps Opus 4.6 can identify vulnerabilities "almost everywhere", but then again, so can many other people and tools, depending on how much they search. Indeed, just a few days ago, another company assessed Opus 4.6: "We tested Opus 4.6 against 435 known vulnerable C functions from real CVEs... the model tended to find around 25% of known vulnerabilities'. Most importantly, the assessment emphasised the real issue: "...but with a lot of false positives (around 60%) and inconsistency between runs".

A very good analysis

  • Lack of novelty in discovery capabilities: Autonomous AI discovery of vulnerabilities in high-profile C codebases like FFmpeg or the Linux kernel is considered "the bare minimum" today as multiple teams (e.g., Google’s BigSleep, ZeroPath, and DARPA’s AIxCC participants) have been achieving similar results using older models for months.
  • Misleading Firefox exploitation metrics: The claim of 181 successful exploits targets a simplified testing harness rather than the actual browser, failing to account for critical modern defenses such as renderer sandboxes, JIT hardening, and site isolation.
    • Buried in footnote 1: “These exploits target a testing harness mimicking a Firefox 147 content process, without the browser’s process sandbox or other defense-in-depth mitigations.” Without the sandbox. Without the defense-in-depth mitigations. This is not Firefox. This is a stripped-down process that shares some code with Firefox.
  • Missing validation funnel for scale claims: While claiming "thousands" of critical findings, the source provides no data on false positive rates, deduplication ratios, or actual exploitability, noting that professional triage often reduces raw finding counts by as much as 98%.
    • Anyone who’s worked with SAST tools knows this: the distance between “tool says critical” and “human confirms critical and exploitable” can easily be 10x. Anthropic might genuinely have thousands of valid findings. But presenting a raw count without showing the validation funnel is the oldest trick in the security vendor playbook.
  • Opaque autonomy standards: Anthropic has released no prompts, interaction logs, or scaffold source code, leaving it unclear how much "autonomous" success was actually dependent on the specific design of the system prompt, container environment, or parallel run selection.
    • Califio published their Opus 4.6 prompts for the same FreeBSD bug. You can go read them right now. They show significant human steering — targeting specific functions, suggesting exploitation approaches, iterating when things didn’t work.

      Anthropic published… nothing. No prompts. No interaction logs. No scaffold source code.

      “No human intervention” sounds clean, but what does it actually mean? No human touched the model after the initial prompt? Sure — but who designed the scaffold? What’s in the system prompt? How was the container environment configured? How many parallel runs were launched? What process picked the winning run out of potentially hundreds of attempts?
It’s not fabrication. It’s exaggeration. Real capability, real findings — but framed to make the evidence carry a much bigger story than it can actually support. And that story just happens to perfectly serve Anthropic’s commercial interests at exactly the right moment.


Commenti

Popular Posts

"Ingegneria deve essere difficile"

Il ritaglio di giornale qui sotto ricorda uno degli eventi più non-trovo-un-aggettivo-appropriato del mio periodo di studente di Ingegneria a Pisa. Ricordo che una mattina iniziò a spargersi la voce "hanno murato la porta del dipartimento!".  Andammo subito a vedere ed arrivammo un pò prima dei giornalisti che scattarono questa foto. La porta era murata, intonacata, pitturata di bianco e sovrastata da una scritta "INGEGNERIA DEVE ESSERE DIFFICILE". Le "E" di "INGEGNERIA" erano scritte al contrario perché era una sorta di "marchio di fabbrica" della facoltà di Ingegneria di Pisa. L'aula più grande, quella in cui pressoché tutti gli studenti seguivano i corsi dei primi anni, aveva infatti alcuni bellissimi "affreschi scherzosi" che furono fatti nel corso delle proteste studentesche di qualche anno prima ed in cui la parola "Ingegneria" era appuntoi scritta in quel modo. Si era anche già sparsa la voce di cosa era ...

Perché studiare Analisi Matematica???

Un mio caro amico mi ha scritto: ...sono con mia figlia che studia Analisi 1...A cosa serve, al giorno d'oggi, studiare Analisi (a parte sfoltire i ranghi degli aspiranti ingegneri)? Riporto la mia risposta di seguito, forse può "motivare" qualche altro studente. ... Per un ingegnere la matematica è fondamentale perché è un linguaggio ; ed è il linguaggio essenziale per trattare gli argomenti che dovrà affrontare come ingegnere; non sono importanti i contenuti specifici; è importante, anzi fondamentale, che riesca a capirli, ricostruirli etc. ad esempio, chi deve usare l'inglese, lo usa perché in un modo o nell'altro lo conosce; nessuno di noi ha usato esattamente le frasi o i dialoghi o le regole che ha incontrato negli esercizi di inglese o di tedesco; nella matematica è lo stesso; non sono importanti i limiti, le serie, i teoremi di cauchy o che so io; ma se uno non è in grado di capire quel linguaggio allora non sarà in grado di capire davvero quas...

One must write correctly. One must explain oneself clearly.

The title of this blog says it all. It is a deep truth of fundamental importance in every profession . I have always tried hard to convince students of this fact. Explaining things clearly and correctly, whether in written or in spoken form, is hard .  It takes a lot of time and experience. Most importantly, some people may have more innate talent. Others may have fewer. However, the first step is to convince oneself of the importance of this fact. Otherwise, the battle is lost before it has begun. I have come to believe that many students have a problem in this respect, as they do not realize how important it is to be clear and correct in our own language. They either believe that technical skills are all that is needed, or that they will magically become perfectly understandable to everyone at some unspecified point in the future. This is definitely not the case. Consequently, they will encounter many unexpected and challenging obstacles in their professional careers. Writing...

Cose che racconto nei corsi (e che poi si verificano) - UPDATED

Reti di Calcolatori e Principi di Cybersecurity , intorno alla fine di settembre: " Il DNS è una infrastruttura critica per il funzionamento della società. Pensiamo a cosa accadrebbe se si bloccasse completamente la risoluzione di alcuni nomi. " Il 20 ottobre 2025 molti servizi Internet usati da molti milioni di utenti in tutto il mondo si sono bloccati o sono diventati lentissimi. Tra questi Apple Music, Airbnb, Spotify, Reddit, Perplexity AI, Duolingo, Goodreads, Fortnite, Apple TV, Mc Donald's App, Signal e molti altri (compresi alcuni servizi della pubblica amministrazione UK). Tutti questi servizi dipendono in tutto o in parte da funzionalità software in Amazon Web Services (AWS), uno dei principali fornitori di servizi cloud al mondo. AWS è composto internamente da molti servizi software. Il motivo scatenante del blocco globale è stato un problema nella risoluzione DNS del nome di un particolare servizio usato internamente in AWS. Cybersecurity , corso aziendale i...

Come si formula una domanda

Molto spesso gli studenti che devono fare una domanda generano una quantità elevatissima di parole che cercano di trasportare 4 concetti molto diversi tra loro: Domanda mirata a chiarire il dubbio Motivo per il quale è sorto il dubbio (ad esempio, slide ambigua o esperienza personale) Descrizione di uno scenario di esempio per chiarire la domanda Informazioni sullo scenario di esempio che sono del tutto inutili per la domanda L'effetto più frequente è che chi riceve la domanda non capisce o deve fare un grande sforzo per capire. Un effetto accessorio molto frequente è che chi formula la domanda non genera una "grande impressione" in chi la riceve. E' molto, molto, molto importante acquisire la capacità di distinguere questi concetti e di rifletterne la separazione in ciò che si dice o si scrive. Non è importante solo per il corso di reti, è importante sempre. In qualsiasi attività professionale. E' parte essenziale della capacità di esprimersi in modo c...