A subtle hallucination by "the AI"

UPDATE AT THE END

I am playing with Gemini for extracting MITRE ATT&CK techniques from cybersecurity incident reports automatically (MITRE ATT&CK is a powerful framework for reasoning about attacks and I use this framework intensively in my Cybersecurity course): you give Gemini the URL of a report and will immediately obtain the attack techniques used in that attack campaign. Here a spreadsheet with some of the outputs. This usage of "the AI" is potentially very useful for grasping the essentials of an attack campaign quickly and providing students with concrete examples.

It is also an usage that fits an essential but often overlooked requirements of AI applications: the cost of a mistake must be small.

The prompt I give to Gemini actually asks to extract another important piece of information: the vulnerabilities possibly used in that campaign.

In my early attempts I asked Gemini to tell, for each listed vulnerability, whether it was still unknown to software manufacturers at the time of exploitation (i.e., whether it was a "zero-day"). I quickly realized that the answer by Gemini is basically unreliable in this respect. This is a report by Google containing a table that lists all the vulnerabilities exploited in that campaign with a column stating which of those were a zero-day. The table identifies 7 vulnerabilities, 3 of which exploited as a zero-day in that campaign. Despite this easily identifiable and easy to extract information, Gemini produced a list of vulnerabilities with none of them marked as a zero-day.

Today I have discovered that even the list of vulnerabilities is unreliable. I have just analyzed The n8n n8mare: How threat actors are misusing AI workflow automation. Gemini claimed that "The campaign significantly leveraged a chain of vulnerabilities in self-hosted n8n instances to move from external probes to full Remote Code Execution (RCE)." and then provided a matrix listing CVE-2026-21858, CVE-2025-68613, CVE-2026-21877.

Well, this is completely wrong. None of these vulnerabilities has something to do with the attack campaign in the report. These vulnerabilities affect the same software considered in the report, have been exploited in the past months but in different campaigns. Not in this one. I have read the report multiple times to check. If I am wrong please let me know.

Maybe one could try to fix the prompt, but that is not the point. The point is that the answer was wrong and it was expressed with full confidence.

Luckily I was just more or less playing. But what if I had used that response to draft an expert report for a criminal trial? What if I were an IT manager and, based on that response, had decided to give the highest priority to mitigating these vulnerabilities?

So, please never forget that whenever you use "the AI" and you are not able to quickly spot its mistakes, the cost of a mistake must be small.

UPDATE May 5-th

I attempted to fix the above wrong behavior by adding this instruction to the prompt: "Limit your analysis to the report, do not try to identify other reports possible related to the same campaign.".

Things have not improved.

The set of extracted tactics and techniques is different (!). Most importantly, the set of extracted CVEs is now composed of only one vulnerability, but even this vulnerability has nothing to do with the campaign described in the report.

The vulnerability has to do with the git configuration: "Versions 0.123.1 through 1.119.1 do not have adequate protections to prevent RCE through the project's pre-commit hooks. The Add Config operation allows workflows to set arbitrary Git configuration values" but, as I said, this has nothing to do with the report.

Alberto Bartoli - Blog

Cerca nel blog

A subtle hallucination by "the AI"

UPDATE May 5-th

Commenti

Popular Posts

"Ingegneria deve essere difficile"

Perché studiare Analisi Matematica???

Cose che racconto nei corsi (e che poi si verificano) - UPDATED

On the Anthropic Mythos Preview - "too dangerous to release"

One must write correctly. One must explain oneself clearly.