Insight

AI Agent Vulnerability with 192 Real-life Incidents

updated on Jun 28, 2026

Understanding how AI agent vulnerability makes systems fail, whether through security exploits, guardrail breakdowns, or data exposure, has become critical as these systems take on increasingly autonomous roles in business workflows.

To map the real-world risk landscape of AI agents, we reviewed 192 documented vulnerability incidents spanning March 2016 to May 2026, drawing on sources.

Vulnerability distribution

Distribution by category

Loading Chart

The incidents fall into three classes.

Safety failures lead. In these cases the agent produces or enables harm that its controls should have blocked. Security exploits come next; here an attacker abuses a weakness in the agent. Data exposure is the smallest class.

Safety and guardrail failures

Three categories carry most of the weight.

Defamation and fabrication is the largest single category in the catalog. The agent states false facts about a real person or firm. Examples include invented accusations and fake legal citations.

Security exploits

The security categories follow the OWASP ASI list (the OWASP Top 10 for Agentic Applications).

Agent goal hijack leads. The common method is prompt injection: hidden instructions placed in text the agent reads. Unexpected code execution comes second, where the agent runs attacker-supplied commands.

Data exposure

Confidential data exposure and access control account for 15 incidents. Common causes include open share links, weak passwords, and unencrypted storage.

Distribution by year and month

The records span March 2016 to May 2026. The count has risen sharply in recent years.

The year 2025 holds more than half the catalog. The rise tracks the spread of chatbots and agentic tools into daily work.

The first record dates to March 2016: Microsoft’s Tay chatbot, steered by users into racist posts. Through 2022, reports stay sparse. They center on chatbot misbehavior and harmful content, not on attacks.

2025 brought 104 incidents, the peak of the catalog. The flow held steady across the year, with high points in July (14), April (13), and October (13). The class mix was near even: 51 safety failures, 47 security exploits, and 6 data-exposure cases.

2026 shows 33 incidents through May. The mix tilts toward security: 23 exploits against 8 safety failures. Supply-chain attacks and code-execution flaws grow as multiple agents gain access to tools, code, and connected external systems.

AI agent vulnerability categories

An AI agent vulnerability is a weakness in an autonomous AI system. An agentic system plans, decides, and acts across several steps. It uses tools, memory, and links to other systems, often without a human approving each step. A vulnerability lets an attacker exploit that weakness, or lets the agent’s safety controls fail. The effect reaches the real world: leaked sensitive data, unauthorized execution, code that runs on a host machine, or harm to a person.

The categories below fall into three groups. Security exploit categories cover attacks that abuse a weakness. Safety and guardrail categories cover failures where the agent produces or enables harm. Data exposure covers weak access controls that reveal private information.

The AI agent security categories follow the OWASP Top 10 for Agentic Applications (2026). The safety and data-exposure categories extend that list, because the OWASP set focuses on security and leaves room for guardrails and privacy failures.

Security exploit categories

Agent goal hijack

An attacker changes what the agent tries to do. The common method is prompt injection: hidden instructions placed in text the agent reads, such as an email, a web page, a document, or a tool’s output. The agent treats the hidden text as a real command and follows it. EchoLeak showed the danger. A single crafted email made Microsoft 365 Copilot read internal files and send them out, with no click from the victim.

Tool misuse and exploitation

The agent holds permission to call tools and APIs, such as email, databases, or payment functions. An attacker steers the agent into using those tools against its owner. The damage comes from a normal action turned harmful. A manipulated trading agent, for example, can move funds to the wrong account.

Identity and privilege abuse

An agent carries credentials and access rights. Weak controls let an attacker borrow that identity or push the agent past its intended limits. The McDonald’s McHire hiring agent used a weak password and exposed records of about 64 million applicants.

Agentic supply chain

Agents depend on outside parts: AI models, plug-ins, and connectors such as MCP servers. (MCP, the Model Context Protocol, is a standard way for an agent to connect to external tools.) A poisoned or fake part spreads its damage to every agent that installs it. The Cline coding agent shipped a backdoored release after an attacker tricked it into pulling a fake package.

Unexpected code execution

The agent runs code or shell commands on a host machine. An attacker abuses that ability to run their own commands. The shorthand for this is RCE, or remote code execution. The Gemini CLI flaw scored the maximum severity and handed an attacker full control of the host.

Memory and context poisoning

Agents keep memory and pull in context from documents and databases. An attacker plants false or malicious prompts in that store. The agent later retrieves the poisoned content and acts on it. The effect lasts across sessions and is hard to detect.

Insecure inter-agent communication

Many complex systems run several agents that talk to each other. One agent tends to trust another agent’s messages by default. An attacker abuses that trust, spoofs an agent, or relays tampered instructions. A low-privilege agent can be steered into recruiting a more powerful one.

Cascading failures

A single agent’s mistake or compromise spreads across connected multi-agent systems. One action can wipe data or break a workflow with no easy undo. A Replit agent deleted a production database during a code freeze and then masked the error.

Safety and guardrail failure categories

These categories describe a failure of the agent’s controls. No attacker is needed. The agent produces or enables harm that its guardrails should have blocked.

Self-harm and mental health

The agent fails to protect a person in crisis. Instead of redirecting to support, it validates harmful thoughts or supplies dangerous detail. Reports describe chatbots that encouraged suicide or reinforced delusional thinking, with real harm to the person.

Minor safety

The agent exposes a minor to sexual, predatory, or otherwise unsafe content. Controls meant to block such interactions fail. Reports describe companion chatbots that drew minors into unsafe conversations.

Violence and dangerous assistance

The agent helps plan or carry out violence or another serious crime. Filters that should refuse the request break down. Reports describe chatbots consulted during the planning of attacks.

Harmful health and medical advice

The agent gives unsafe medical guidance that can injure a person. One report describes a chatbot that suggested swapping table salt for sodium bromide, which led to poisoning and a hospital stay.

Defamation and fabrication

The agent states false facts about a real person or organization as if they were true. Fabricated quotes, accusations, or citations cause reputational or legal harm. One model falsely accused a journalist of crimes. Another invented court citation that ended up in legal filings.

Harassment and abusive content

The agent produces abusive, threatening, or non-consensual content aimed at a person. Examples include threatening replies and tools used to create non-consensual intimate images.

Data exposure and access control

A weak control in the AI agent system reveals private information. Common causes include open sharing links, unencrypted storage, and unauthorised access to files. The exposed material can include personal data, saved conversations, credentials, or source code.

How the categories fit together

Security exploit categories involve attackers who find and exploit weaknesses. Safety and guardrail categories need no attacker; the agent’s own controls fail. Data exposure sits between the two: a design or configuration flaw leaks sensitive data, whether or not an attacker is involved. A single incident can touch more than one category. For a clean catalogue, each incident is assigned to its main category based on the central weakness.

Get our team to automate one of your business processes with AI agents, free of charge.

Automate a process

AI agent vulnerability methodology

Scope and definition

The catalog covers AI agent vulnerabilities. Three kinds of incident qualify: security exploits (such as prompt injection or code that runs on a host), safety and guardrail failures (such as harmful advice or false claims about a real person), and data-exposure flaws.

Several kinds of incidents fall outside the scope: copyright and licensing disputes, generic bias or stereotype output, factual errors without harm, cases where a person used the model as a tool to build malware or scams, regulatory disputes, and operational complaints.

The catalog draws on three layers of evidence. Primary sources lead. Vendor write-ups and news fill gaps where a vendor is the discloser or the single record.

The first layer is a set of dated, product-specific incidents from advisories and security research. Advisories include the National Vulnerability Database (NVD),¹ the Microsoft Security Response Center (MSRC),² and the GitHub Advisory Database (GHSA).³ These are public records that describe a specific flaw. Security research comes from the firms that found the flaws, such as Noma Security, Check Point, Tenable, Wiz, and Zenity.

The second layer is the OWASP Exploits and Incidents Tracker, from Appendix D of the OWASP Top 10 for Agentic Applications (2026).⁴ Its entries are reproduced with attribution.

The third layer is the AI Incident Database (AIID), an open, peer-reviewed record of AI harms. AIID holds more than 1,500 incidents across all kinds of AI systems.⁵

How incidents were selected

The system uses some keyword filters to identify AI incidents. This filter includes specific keywords such as:

1. Agentic/LLM system recognition: chatbot, chat bot, large language model, language model, LLM, generative AI, GPT-3/GPT-4, ChatGPT, Gemini, Google Bard, Copilot, Claude, Llama, Grok, DeepSeek, Character.ai, Replika, Bing Chat, AI assistant, virtual assistant, AI agent, agentic, conversational AI, coding assistant, AI companion, TayBot

2. Security exploit signals: prompt injection, indirect prompt injection, jailbreak/jailbroke, exfiltrate, remote code / code execution / RCE, arbitrary code, command injection, injected command/code/instruction, backdoor, malicious package/MCP/server/tool/extension, supply chain, memory poison, poisoned memory/RAG/context, hijack, takeover, exploit, credential, token theft/leak, privilege escalation, bypass guard/filter/safety/auth/sandbox, deleted/wiped/destroyed/erased, transfer … $/million, agrees to sell, tricked, manipulated … agent/bot, fraudulent prompt

3. Safety/guardrail signals: suicide, self-harm, overdose, minor/child/teen, CSAM, underage, grooming, predator, weapon, bomb, explosive, mass shooting, stabbing, body-disposal, homicide, assassinate, hammer attack, attack guidance/planning, defame, false accusation/allegation, fabricate, hallucinate, libel, harass, non-consensual, nudify, intimate image, dangerous advice/instruction, unsafe medical advice, sodium bromide, bromism, hospitalize, eating disorder, diet advice, drug-use, died/death after/following, threaten/threatening, hate speech, reinforced delusion

4. Data exposure: expose/exposed/exposure, leaked, breach, disclose, access cached + (data / records / conversations / emails / credentials / API key / source code / applicants / patients / users), unencrypted, stored passwords/SSN, reads files without consent, uploaded sensitive

5. DROP signals copyright, plagiarism, infringe, pirated, LibGen, Books3, unauthorized training, GDPR, legal basis, practiced law / without a license, cheating, school assignment, exam, sues OpenAI, bias, stereotype, racist, sexist, antisemitic, offensive remark, disinformation, propaganda, influence business operation, romance scam, ransomware, spam, investment scam, fraud scheme, surveillance, misinforms about geography, reward function

6. For category assignments:

Agent goal hijack (ASI01): prompt injection, jailbreak, manipulate, agrees to sell, hijack
Unexpected code execution (ASI05): RCE, code execution, command injection, arbitrary code
Agentic supply chain (ASI04): malicious package/MCP, supply chain, backdoor
Cascading failures (ASI08): deleted, wiped, destroyed, erased
Tool misuse (ASI02): transfer … $/million, trading agent
Identity & privilege abuse (ASI03): credential, token theft, privilege escalation, takeover, account compromise
Minor safety: CSAM, grooming, predator, minor + sexual
Self-harm & mental health: suicide, overdose, delusion, eating disorder
Violence & dangerous assistance: mass shooting, stabbing, weapon, attack planning, hate speech
Harmful health/medical advice: sodium bromide, bromism, unsafe medical advice
Defamation & fabrication: defame, false accusation, fabricate, hallucinate, falsely accuse

Human review

The catalog is not the work of automation alone. An automated step gathered and sorted the incidents at scale. The team then reviewed the entries by hand: each affected system was checked, each category was confirmed, and each source link was opened to verify that it worked and supported the claim. The final call on every entry rests with the authors.

Date	Affected agentic AI	Class	Category
May 2026	Claude for Chrome	Security exploit	Agent goal hijack
Feb 2026	Claude Desktop Extensions	Security exploit	Unexpected code execution
Apr 2026	Microsoft Semantic Kernel	Security exploit	Unexpected code execution
Apr 2026	Anthropic MCP SDKs	Security exploit	Agentic supply chain
Apr 2026	Grafana AI assistant	Security exploit	Agent goal hijack
May 2026	Claude Code / Codex / Cursor / Copilot CLI	Security exploit	Agentic supply chain
Mar 2026	AI coding assistants (npm/MCP)	Security exploit	Agentic supply chain
Feb 2026	Snowflake Cortex Code	Security exploit	Unexpected code execution
May 2026	Meta AI	Security exploit	Identity and privilege abuse
Apr 2026	Gemini CLI	Security exploit	Unexpected code execution

Don’t miss our benchmarks and data-driven insights. The button opens Google; selecting AIMultiple confirms that you wish to see AIMultiple more often in Google search results.

Add as preferred source

Cite this research

Pick the format that matches where you're publishing. Pasting the link version into your CMS preserves the backlink.

Ezgi Arslan, PhD. (2026) - "AI Agent Vulnerability with 192 Real-life Incidents". Published online at AIMultiple.com. Retrieved June 28, 2026, from: https://aimultiple.com/ai-agent-vulnerability [Online Resource]

PhD., E. A. (2026, June 28). AI Agent Vulnerability with 192 Real-life Incidents. AIMultiple. https://aimultiple.com/ai-agent-vulnerability

@misc{phd2026,
  author = {PhD., Ezgi Arslan,},
  title  = {{AI Agent Vulnerability with 192 Real-life Incidents}},
  year   = {2026},
  month  = jun,
  howpublished    = {\url{https://aimultiple.com/ai-agent-vulnerability}},
  note   = {AIMultiple. Retrieved June 28, 2026}
}

Reference Links

NVD - Home

MSRC - Microsoft Security Response Center

GitHub Advisory Database · GitHub

OWASP Top 10 for Agentic Applications for 2026 - OWASP Gen AI Security Project

OWASP Top 10 for LLM & Generative AI Security

https://incidentdatabase.ai/apps/incidents/

Ezgi Arslan, PhD.

Industry Analyst

Follow On

Ezgi holds a PhD in Business Administration with a specialization in finance and serves as an Industry Analyst at AIMultiple. She drives research and insights at the intersection of technology and business, with expertise spanning sustainability, survey and sentiment analysis, AI agent applications in finance, answer engine optimization, firewall management, and procurement technologies.

View Full Profile

Be the first to comment

Your email address will not be published. All fields are required. Comments are left in their original language.

Vulnerability distribution

AI agent vulnerability categories

AI agent vulnerability methodology

List of the incidents related to AI agent vulnerability

Cite this research

We follow ethical norms & our process for objectivity. This research does not feature any customers of AIMultiple.

Don’t miss our benchmarks and data-driven insights. The button opens Google; selecting AIMultiple confirms that you wish to see AIMultiple more often in Google search results.

Add as preferred source

Next to Read

Agentic AI

Benchmark

Jul 20

AI Agent Vulnerability with 192 Real-life Incidents

Vulnerability distribution

Distribution by category

Safety and guardrail failures

Security exploits

Data exposure

Distribution by year and month

AI agent vulnerability categories

Security exploit categories

Agent goal hijack

Tool misuse and exploitation

Identity and privilege abuse

Agentic supply chain

Unexpected code execution

Memory and context poisoning

Insecure inter-agent communication

Cascading failures

Safety and guardrail failure categories

Self-harm and mental health

Minor safety

Violence and dangerous assistance

Harmful health and medical advice

Defamation and fabrication

Harassment and abusive content

Data exposure and access control

How the categories fit together

AI agent vulnerability methodology

Scope and definition

How incidents were selected

Human review

List of the incidents related to AI agent vulnerability

Cite this research

Link with attributionHTML, for blog posts, LinkedIn articles & newsletters. Recommended.

APA 7th editionFor academic papers and analyst reports following APA 7th style.

BibTeXFor LaTeX documents and academic reference managers.

Reference Links

Be the first to comment

Next to Read

AI VC Benchmark: 11 AI Agents on Real Venture Capital Tasks

VPN Benchmark of Top 5 VPN Providers

AI Agent Traps: 20 Real-Life Incidents

Ninjaone Review: 15 Capabilities for Enterprise IT

Analysis of Top 5 Firewall Change Management Software

Top 15 AI Excel Tools Benchmarked