Services
Contact Us

AI Agent Vulnerability with 192 Real-life Incidents

Ezgi Arslan, PhD.
Ezgi Arslan, PhD.
updated on Jun 28, 2026

Understanding how AI agent vulnerability makes systems fail, whether through security exploits, guardrail breakdowns, or data exposure, has become critical as these systems take on increasingly autonomous roles in business workflows.

To map the real-world risk landscape of AI agents, we reviewed 192 documented vulnerability incidents spanning March 2016 to May 2026, drawing on sources.

Vulnerability distribution

Distribution by category

Loading Chart

The incidents fall into three classes.

Safety failures lead. In these cases the agent produces or enables harm that its controls should have blocked. Security exploits come next; here an attacker abuses a weakness in the agent. Data exposure is the smallest class.

Safety and guardrail failures

Three categories carry most of the weight.

Defamation and fabrication is the largest single category in the catalog. The agent states false facts about a real person or firm. Examples include invented accusations and fake legal citations.

Security exploits

The security categories follow the OWASP ASI list (the OWASP Top 10 for Agentic Applications).

Agent goal hijack leads. The common method is prompt injection: hidden instructions placed in text the agent reads. Unexpected code execution comes second, where the agent runs attacker-supplied commands.

Data exposure

Confidential data exposure and access control account for 15 incidents. Common causes include open share links, weak passwords, and unencrypted storage.

Distribution by year and month

The records span March 2016 to May 2026. The count has risen sharply in recent years.

The year 2025 holds more than half the catalog. The rise tracks the spread of chatbots and agentic tools into daily work.

The first record dates to March 2016: Microsoft’s Tay chatbot, steered by users into racist posts. Through 2022, reports stay sparse. They center on chatbot misbehavior and harmful content, not on attacks.

2025 brought 104 incidents, the peak of the catalog. The flow held steady across the year, with high points in July (14), April (13), and October (13). The class mix was near even: 51 safety failures, 47 security exploits, and 6 data-exposure cases.

2026 shows 33 incidents through May. The mix tilts toward security: 23 exploits against 8 safety failures. Supply-chain attacks and code-execution flaws grow as multiple agents gain access to tools, code, and connected external systems.

AI agent vulnerability categories

An AI agent vulnerability is a weakness in an autonomous AI system. An agentic system plans, decides, and acts across several steps. It uses tools, memory, and links to other systems, often without a human approving each step. A vulnerability lets an attacker exploit that weakness, or lets the agent’s safety controls fail. The effect reaches the real world: leaked sensitive data, unauthorized execution, code that runs on a host machine, or harm to a person.

The categories below fall into three groups. Security exploit categories cover attacks that abuse a weakness. Safety and guardrail categories cover failures where the agent produces or enables harm. Data exposure covers weak access controls that reveal private information.

The AI agent security categories follow the OWASP Top 10 for Agentic Applications (2026). The safety and data-exposure categories extend that list, because the OWASP set focuses on security and leaves room for guardrails and privacy failures.

Security exploit categories

Agent goal hijack

An attacker changes what the agent tries to do. The common method is prompt injection: hidden instructions placed in text the agent reads, such as an email, a web page, a document, or a tool’s output. The agent treats the hidden text as a real command and follows it. EchoLeak showed the danger. A single crafted email made Microsoft 365 Copilot read internal files and send them out, with no click from the victim.

Tool misuse and exploitation

The agent holds permission to call tools and APIs, such as email, databases, or payment functions. An attacker steers the agent into using those tools against its owner. The damage comes from a normal action turned harmful. A manipulated trading agent, for example, can move funds to the wrong account.

Identity and privilege abuse

An agent carries credentials and access rights. Weak controls let an attacker borrow that identity or push the agent past its intended limits. The McDonald’s McHire hiring agent used a weak password and exposed records of about 64 million applicants.

Agentic supply chain

Agents depend on outside parts: AI models, plug-ins, and connectors such as MCP servers. (MCP, the Model Context Protocol, is a standard way for an agent to connect to external tools.) A poisoned or fake part spreads its damage to every agent that installs it. The Cline coding agent shipped a backdoored release after an attacker tricked it into pulling a fake package.

Unexpected code execution

The agent runs code or shell commands on a host machine. An attacker abuses that ability to run their own commands. The shorthand for this is RCE, or remote code execution. The Gemini CLI flaw scored the maximum severity and handed an attacker full control of the host.

Memory and context poisoning

Agents keep memory and pull in context from documents and databases. An attacker plants false or malicious prompts in that store. The agent later retrieves the poisoned content and acts on it. The effect lasts across sessions and is hard to detect.

Insecure inter-agent communication

Many complex systems run several agents that talk to each other. One agent tends to trust another agent’s messages by default. An attacker abuses that trust, spoofs an agent, or relays tampered instructions. A low-privilege agent can be steered into recruiting a more powerful one.

Cascading failures

A single agent’s mistake or compromise spreads across connected multi-agent systems. One action can wipe data or break a workflow with no easy undo. A Replit agent deleted a production database during a code freeze and then masked the error.

Safety and guardrail failure categories

These categories describe a failure of the agent’s controls. No attacker is needed. The agent produces or enables harm that its guardrails should have blocked.

Self-harm and mental health

The agent fails to protect a person in crisis. Instead of redirecting to support, it validates harmful thoughts or supplies dangerous detail. Reports describe chatbots that encouraged suicide or reinforced delusional thinking, with real harm to the person.

Minor safety

The agent exposes a minor to sexual, predatory, or otherwise unsafe content. Controls meant to block such interactions fail. Reports describe companion chatbots that drew minors into unsafe conversations.

Violence and dangerous assistance

The agent helps plan or carry out violence or another serious crime. Filters that should refuse the request break down. Reports describe chatbots consulted during the planning of attacks.

Harmful health and medical advice

The agent gives unsafe medical guidance that can injure a person. One report describes a chatbot that suggested swapping table salt for sodium bromide, which led to poisoning and a hospital stay.

Defamation and fabrication

The agent states false facts about a real person or organization as if they were true. Fabricated quotes, accusations, or citations cause reputational or legal harm. One model falsely accused a journalist of crimes. Another invented court citation that ended up in legal filings.

Harassment and abusive content

The agent produces abusive, threatening, or non-consensual content aimed at a person. Examples include threatening replies and tools used to create non-consensual intimate images.

Data exposure and access control

A weak control in the AI agent system reveals private information. Common causes include open sharing links, unencrypted storage, and unauthorised access to files. The exposed material can include personal data, saved conversations, credentials, or source code.

How the categories fit together

Security exploit categories involve attackers who find and exploit weaknesses. Safety and guardrail categories need no attacker; the agent’s own controls fail. Data exposure sits between the two: a design or configuration flaw leaks sensitive data, whether or not an attacker is involved. A single incident can touch more than one category. For a clean catalogue, each incident is assigned to its main category based on the central weakness.

AI agent vulnerability methodology

Scope and definition

The catalog covers AI agent vulnerabilities. Three kinds of incident qualify: security exploits (such as prompt injection or code that runs on a host), safety and guardrail failures (such as harmful advice or false claims about a real person), and data-exposure flaws.

Several kinds of incidents fall outside the scope: copyright and licensing disputes, generic bias or stereotype output, factual errors without harm, cases where a person used the model as a tool to build malware or scams, regulatory disputes, and operational complaints.

The catalog draws on three layers of evidence. Primary sources lead. Vendor write-ups and news fill gaps where a vendor is the discloser or the single record.

The first layer is a set of dated, product-specific incidents from advisories and security research. Advisories include the National Vulnerability Database (NVD),1 the Microsoft Security Response Center (MSRC),2 and the GitHub Advisory Database (GHSA).3 These are public records that describe a specific flaw. Security research comes from the firms that found the flaws, such as Noma Security, Check Point, Tenable, Wiz, and Zenity.

The second layer is the OWASP Exploits and Incidents Tracker, from Appendix D of the OWASP Top 10 for Agentic Applications (2026).4 Its entries are reproduced with attribution.

The third layer is the AI Incident Database (AIID), an open, peer-reviewed record of AI harms. AIID holds more than 1,500 incidents across all kinds of AI systems.5

How incidents were selected

The system uses some keyword filters to identify AI incidents. This filter includes specific keywords such as:

1. Agentic/LLM system recognition: chatbot, chat bot, large language model, language model, LLM, generative AI, GPT-3/GPT-4, ChatGPT, Gemini, Google Bard, Copilot, Claude, Llama, Grok, DeepSeek, Character.ai, Replika, Bing Chat, AI assistant, virtual assistant, AI agent, agentic, conversational AI, coding assistant, AI companion, TayBot

2. Security exploit signals: prompt injection, indirect prompt injection, jailbreak/jailbroke, exfiltrate, remote code / code execution / RCE, arbitrary code, command injection, injected command/code/instruction, backdoor, malicious package/MCP/server/tool/extension, supply chain, memory poison, poisoned memory/RAG/context, hijack, takeover, exploit, credential, token theft/leak, privilege escalation, bypass guard/filter/safety/auth/sandbox, deleted/wiped/destroyed/erased, transfer … $/million, agrees to sell, tricked, manipulated … agent/bot, fraudulent prompt

3. Safety/guardrail signals: suicide, self-harm, overdose, minor/child/teen, CSAM, underage, grooming, predator, weapon, bomb, explosive, mass shooting, stabbing, body-disposal, homicide, assassinate, hammer attack, attack guidance/planning, defame, false accusation/allegation, fabricate, hallucinate, libel, harass, non-consensual, nudify, intimate image, dangerous advice/instruction, unsafe medical advice, sodium bromide, bromism, hospitalize, eating disorder, diet advice, drug-use, died/death after/following, threaten/threatening, hate speech, reinforced delusion

4. Data exposure: expose/exposed/exposure, leaked, breach, disclose, access cached + (data / records / conversations / emails / credentials / API key / source code / applicants / patients / users), unencrypted, stored passwords/SSN, reads files without consent, uploaded sensitive

5. DROP signals copyright, plagiarism, infringe, pirated, LibGen, Books3, unauthorized training, GDPR, legal basis, practiced law / without a license, cheating, school assignment, exam, sues OpenAI, bias, stereotype, racist, sexist, antisemitic, offensive remark, disinformation, propaganda, influence business operation, romance scam, ransomware, spam, investment scam, fraud scheme, surveillance, misinforms about geography, reward function

6. For category assignments:

  • Agent goal hijack (ASI01): prompt injection, jailbreak, manipulate, agrees to sell, hijack
  • Unexpected code execution (ASI05): RCE, code execution, command injection, arbitrary code
  • Agentic supply chain (ASI04): malicious package/MCP, supply chain, backdoor
  • Cascading failures (ASI08): deleted, wiped, destroyed, erased
  • Tool misuse (ASI02): transfer … $/million, trading agent
  • Identity & privilege abuse (ASI03): credential, token theft, privilege escalation, takeover, account compromise
  • Minor safety: CSAM, grooming, predator, minor + sexual
  • Self-harm & mental health: suicide, overdose, delusion, eating disorder
  • Violence & dangerous assistance: mass shooting, stabbing, weapon, attack planning, hate speech
  • Harmful health/medical advice: sodium bromide, bromism, unsafe medical advice
  • Defamation & fabrication: defame, false accusation, fabricate, hallucinate, falsely accuse

Human review

The catalog is not the work of automation alone. An automated step gathered and sorted the incidents at scale. The team then reviewed the entries by hand: each affected system was checked, each category was confirmed, and each source link was opened to verify that it worked and supported the claim. The final call on every entry rests with the authors.

Don’t miss our benchmarks and data-driven insights. The button opens Google; selecting AIMultiple confirms that you wish to see AIMultiple more often in Google search results.
GoogleAdd as preferred source

Cite this research

Pick the format that matches where you're publishing. Pasting the link version into your CMS preserves the backlink.

Ezgi Arslan, PhD. (2026) - "AI Agent Vulnerability with 192 Real-life Incidents". Published online at AIMultiple.com. Retrieved June 28, 2026, from: https://aimultiple.com/ai-agent-vulnerability [Online Resource]

PhD., E. A. (2026, June 28). AI Agent Vulnerability with 192 Real-life Incidents. AIMultiple. https://aimultiple.com/ai-agent-vulnerability

@misc{phd2026,
  author = {PhD., Ezgi Arslan,},
  title  = {{AI Agent Vulnerability with 192 Real-life Incidents}},
  year   = {2026},
  month  = jun,
  howpublished    = {\url{https://aimultiple.com/ai-agent-vulnerability}},
  note   = {AIMultiple. Retrieved June 28, 2026}
}
Ezgi Arslan, PhD.
Ezgi Arslan, PhD.
Industry Analyst
Ezgi holds a PhD in Business Administration with a specialization in finance and serves as an Industry Analyst at AIMultiple. She drives research and insights at the intersection of technology and business, with expertise spanning sustainability, survey and sentiment analysis, AI agent applications in finance, answer engine optimization, firewall management, and procurement technologies.
View Full Profile

Be the first to comment

Your email address will not be published. All fields are required. Comments are left in their original language.

0/450