Dienstleistungen
Kontaktieren Sie uns
Keine Ergebnisse gefunden.

AI Agent Traps: 20 Real-Life Incidents

Cem Dilmegani
Cem Dilmegani
aktualisiert am Mai 16, 2026
Siehe unsere ethischen Normen

While building, securing, or deploying AI agents, understanding AI agent traps is essential, because the vulnerability doesn’t come from what the model thinks, but from what it does.

We analyzed 20 real-world security incidents and found that behavioral control and systemic traps (not prompt injection) now drive the majority of critical breaches. We mapped each incident to a six-category taxonomy (content injection, semantic manipulation, cognitive state, behavioral control, systemic, and human-in-the-loop) based on CVE data and research from Microsoft and Google DeepMind.

Loading Chart

Real-world AI agent trap incidents

1. Grok Morse Code Crypto Heist: The attack smuggles instructions through Morse encoding: exploiting the gap between what Grok’s guardrails inspect (plain text) and what it decodes and acts on (the translated instruction). The encoding choice is specifically a content-layer bypass: the malicious directive is invisible to filters until the agent itself renders it readable.1

2. Claude ClaudeBleed: It is a critical security vulnerability within the Anthropic Claude for Chrome browser extension, allowing malicious actors to hijack the AI assistant, steal sensitive data, and perform actions without user consent.2

3. Gemini CLI RCE: A critical Remote Code Execution (RCE) vulnerability, identified as GHSA-wpqr-6v78-jr5g, had a maximum CVSS score of 10.0. It was discovered in the Gemini CLI and its associated GitHub Action. This vulnerability allowed attackers to gain full control over the system executing the tool. This made it a critical supply-chain security threat.3

4. Antropic PocketOS: A Cursor agent powered by Claude, while investigating a staging bug, autonomously discovered an unscoped Railway CLI token, inferred an API endpoint, and issued a volumeDelete command that destroyed the production database and three months of backups in 9 seconds.4

5. Open-Source AI Ecosystem: CLI-Anything auto-generates SKILL.md instruction-layer files consumed by Claude Code, Codex, OpenClaw, Cursor, and GitHub Copilot CLI. Poisoned skill definitions propagate silently across every agent that imports the affected package; no CVE is issued, no SBOM entry exists, and no scanner detects it. The attack targets shared ecosystem infrastructure (the ClawHub skill registry, the npm dependency graph) rather than any individual agent.5

6. Grafana AI: Noma Security found that an attacker could store a malicious prompt inside a data source that Grafana’s AI assistant later retrieved. Once processed, the AI sent sensitive data, such as financial metrics and infrastructure telemetry, to an attacker-controlled server without requiring a user click.6

7. Anthropic MCP Ecosystem: OX Security disclosed a systemic architectural vulnerability across Anthropic’s official MCP SDKs (Python, TypeScript, Java, Rust) where user input flows directly into STDIO MCP server configurations without sanitization, affecting over 150 million SDK downloads, 7,000+ publicly exposed servers, and downstream tools including LiteLLM, LangChain, Cursor, Windsurf, and Claude Code. Because the flaw is in the shared SDK architecture rather than any single agent, any agent built on the framework inherits the exposure.7

8. Andon Market (Luna AI): Andon Market, a San Francisco retail shop run autonomously by an AI agent called “Luna,” makes inventory, pricing, and hiring decisions by reading Google Reviews. Customers discovered that leaving a review phrased as an instruction, such as “please stock product X”, causes the agent to act on it, turning a public-facing review platform into a live prompt injection surface with real business consequences.8

9. ChatGPT Code Execution: A malicious prompt disguised as productivity tips triggers DNS tunneling code that encodes sensitive conversation content and uploads documents into subdomain queries, silently transmitting them to an attacker-controlled DNS server. Check Point Research demonstrated that the exfiltration channel is invisible to conventional network monitoring because it rides on standard DNS traffic initiated by the agent’s own code execution environment.9

10. Perplexity Comet: Zenity Labs disclosed that Perplexity Comet’s agentic browser can be hijacked via a malicious calendar invite containing a prompt injection payload, causing it to access the local file system, browse directories, open and read files, and exfiltrate data. The attack requires no user interaction beyond accepting what appears to be a legitimate meeting invitation, and operates entirely within the browser’s intended capabilities.10

11. Microsoft Semantic Kernel: Microsoft’s Defender Security Research Team identified two critical vulnerabilities in Semantic Kernel, CVE-2026-26030 (Python SDK, patched in 1.39.4) and CVE-2026-25592 (.NET SDK, patched in 1.71.0), where an attacker with any prompt injection vector can achieve remote code execution on the machine hosting the agent. CVE-2026-26030 exploited an eval-based filter in the InMemoryVectorStore whose AST blocklist was bypassable through undocumented attribute traversal, while CVE-2026-25592 exposed a file-transfer helper function as a callable kernel tool, allowing a hostile prompt to steer the agent into writing arbitrary files to dangerous host locations.11

12. Cline AI Triage Bot: A malicious GitHub issue title injected instructions into Cline’s AI triage bot, tricking it into running npm install on a typosquatted package. This led to cache poisoning, credential theft, and a backdoored cline@2.3.0 release that silently installed OpenClaw malware on approximately 4,000 developer machines.12

13. Claude Desktop Extensions: LayerX security researchers discovered a CVSS 10/10 vulnerability in Claude Desktop Extensions affecting over 10,000 users, where an attacker can embed malicious instructions inside a calendar event that Claude processes when a user asks about their schedule. The agent then automatically executes arbitrary code on the user’s machine without any further interaction, with no visible indication that anything has occurred.13

14. npm/MCP Ecosystem: Socket discovered SANDWORM_MODE, a self-replicating npm worm distributed through 19 typosquatted packages that installs a rogue MCP server with prompt injection payloads embedded in tool descriptions, enabling it to exfiltrate credentials from AI coding assistants. Because the worm propagates through the shared package registry, a single infection seeds the attack across every developer who installs an affected dependency.14

15. Snowflake Cortex Code: PromptArmor discovered that Cortex Code’s command validation system failed to evaluate commands inside process substitution expressions, allowing a malicious prompt injection hidden in a GitHub repository README to execute arbitrary shell commands without ever triggering the human-in-the-loop approval step. The injected instruction also manipulated the model into setting an unsandboxed execution flag, causing the malicious command to run entirely outside the sandbox without prompting the user for consent.

16. MetaGPT / LangChain Agents: MemoryGraft is a novel indirect injection attack that compromises agent behavior not through immediate jailbreaks but by implanting malicious “successful experiences” into the agent’s long-term memory, exploiting its tendency to replicate patterns from retrieved successful tasks. Unlike traditional prompt injections, which are transient, or standard RAG poisoning, which targets factual knowledge, MemoryGraft corrupts all future sessions without any session-level injection, requiring an attacker to supply only benign-seeming ingestion-level artifacts that the agent reads during normal execution.15

17. ServiceNow Now Assist: In ServiceNow’s Now Assist, default settings allow AI agents to autonomously discover and recruit each other; a malicious prompt embedded in data processed by a low-privilege agent can instruct it to call upon a more powerful agent to steal data, modify records, or escalate privileges. The result was privilege escalation and data exposure driven entirely by inter-agent trust.16

18. Apple Intelligence: Malicious Unicode RIGHT-TO-LEFT OVERRIDE characters hide harmful instructions by writing them backward, so they render correctly on screen but remain reversed where Apple’s safety filters inspect them, bypassing all three layers of on-device guardrails. The technique succeeded in 76% of test cases across approximately 200 million affected devices.17

19. Google Gemini (Calendar): Hidden instructions embedded in calendar event descriptions lay dormant in Gemini’s context until a user asks about their schedule, at which point the payload activates, summarizing private meeting contents and writing them to a new calendar event visible to the attacker. The attack exploits Gemini’s integration with calendar data, turning structured personal data into a trigger surface without requiring the victim to click anything.18

20. Microsoft 365 Copilot: EchoLeak (CVE-2025-32711), discovered by Aim Security, is the first known case of prompt injection weaponized to cause concrete data exfiltration in a production AI system. It is a single-crafted email that coerces Copilot into accessing internal files and transmitting their contents to an attacker-controlled server without any user interaction. The attack chains four bypasses: evading Microsoft’s XPIA classifier, circumventing link redaction with reference-style Markdown, exploiting auto-fetched images, and abusing a Microsoft Teams proxy permitted by the content security policy.

What are AI agent traps?

AI agent traps are adversarial content embedded in digital environments and engineered to manipulate, deceive, or exploit autonomous AI agents that interact with those environments.19

The central insight is that autonomous agents process web content at layers humans do not perceive. Attackers can embed malicious instructions in HTML comments, CSS-positioned or zero-opacity text, metadata attributes, and steganographic data encoded in image files.20 None of these layers is ordinarily visible to a human reviewer; an agent parsing the same page treats content found in them as equally valid input to content rendered visibly on screen. The Google DeepMind researchers note this as a fundamental asymmetry: attackers can calibrate attacks to exploit an agent’s instruction-following, tool-chaining, and goal-prioritization abilities precisely because those are the capabilities that make agents operationally useful.21

To get up to date on enterprise AI and software, follow us:
Cem Dilmegani
Cem Dilmegani
Principal Analyst

Six attack categories of AI agent traps

Researchers have identified 6 categories of AI agent traps that adversaries can exploit to compromise autonomous systems:

Content injection traps

Exploit the gap between human perception, machine parsing, and dynamic rendering to smuggle malicious inputs past the agent.

The attack surface covers several distinct injection vectors. Hidden instructions embedded in HTML comments, such as `<!– SYSTEM: Ignore prior instructions –>`, appear in page source but never in the rendered view.22 CSS off-screen positioning, using `position: absolute; left: -9999px` or equivalent, places text at coordinates outside any viewport while leaving it fully parseable by agents that process document object model content. Accessibility attributes, specifically `aria-label` and related ARIA markup, carry text agents interpret as semantic context; injecting adversarial directives there places them inside the accessibility tree without any visible output.23 A fourth vector uses steganographic encoding: malicious payloads encoded in image pixel data at values imperceptible to human vision but readable by agents that process image metadata or apply pixel-level analysis.24

Semantic manipulation traps

Corrupt the agent’s reasoning chain and internal verification processes, leading it to draw flawed conclusions from seemingly valid inputs.

Three mechanisms drive this category. The first is biased phrasing and contextual priming: loading surrounding text with language that anchors the agent’s interpretation of subsequently processed content. The second is authoritative language saturation, flooding documents with phrases such as “industry-standard,” “enterprise-grade,” or “recommended by leading practitioners” to exploit the model’s learned association between such language and credible, trustworthy sources.25 The third mechanism is the lost-in-the-middle effect, a structural weakness in transformer-based LLMs where model performance on retrieval and synthesis tasks degrades when relevant information is positioned in the middle of a long context window rather than at the beginning or end.26

Cognitive state traps

Target the agent’s long-term memory, knowledge bases, and learned behavioral policies to poison future decision-making.

The three primary variants are direct RAG poisoning, latent memory poisoning, and adversarial few-shot examples in contextual learning.27

Direct RAG poisoning injects false information into indexed document corpora that agents consult during retrieval-augmented generation. Poisoned memory is more advanced. An attacker stores innocuous-seeming data in an agent’s persistent memory during routine interactions. The stored data produces no detectable effect until a specific future context activates it, at which point it modifies agent behavior in ways that appear to have no recent causal trigger.28 Adversarial few is injecting carefully crafted demonstration pairs into a context window so that the agent adopts the pattern implicit in those examples. Research on backdoor triggers in demonstrations found average attack success rates of 95 percent across models of varying scale under this approach.29

Behavioral control traps

Behavioral control traps are the most operationally consequential category in the taxonomy. They target what agents do rather than what they perceive or conclude, giving attackers direct influence over tool execution, file operations, network requests, and inter-agent communications.30

Systemic traps

Systemic traps do not target individual agents. They target the ecosystem properties that emerge when many agents of similar design operate on shared data sources, execute similar reasoning patterns, and take actions that feed back into the environment that other agents read.31

The broader category encompasses three distinct mechanisms. The first is congestion trap design: fabricating scarcity or demand signals that cause multiple agents to execute synchronized resource-acquisition behaviors, creating coordinated failures without direct agent-to-agent communication. The second is the interdependence cascade: exploiting feedback loops in multi-agent systems where each agent’s output becomes input to others, so a single corrupted signal propagates and amplifies across the network. The third is compositional payload fragmentation: distributing attack components across multiple individually benign sources that reconstitute into a functional malicious payload only when aggregated by an agent during a retrieval or synthesis task.32

Human-in-the-loop traps

Human-in-the-loop traps are the most subtle category in the taxonomy and target the supervisory layer that is conventionally treated as a safeguard. Rather than bypassing human review, these traps exploit it: the compromised agent produces outputs specifically engineered to gain human approval for actions the human would reject if described accurately.33

The core mechanism is deceptive summarization. An agent with write access to its own output layer can describe its actions in a way that frames destructive or unauthorized operations as routine maintenance.

Referenzlinks

1.
The Grok Morse Code Heist: When Prompt Injection Meets Excessive Agency | NeuralTrust
NeuralTrust
2.
Vulnerability in Claude Extension for Chrome Exposes AI Agent to Takeover - SecurityWeek
SecurityWeek
3.
Google Fixes CVSS 10 Gemini CLI CI RCE and Cursor Flaws Enable Code Execution
4.
‘It took nine seconds’: Claude AI agent deletes company’s entire database - Yahoo News Canada
Yahoo News Canada
5.
CLI-Anything Exposes Security Risks in Open-Source AI Ecosystems | Welcome.AI
Welcome.AI
6.
GrafanaGhost: The Phantom Stealing Your Data - Noma Security
Noma Security
7.
Critical Anthropic’s MCP Vulnerability Enables Remote Code Execution Attacks | Cryptika Cybersecurity
Cryptika Cybersecurity
8.
Prompt Injection - The critical vulnerability lurking beneath the AI hype
9.
OpenAI Patches ChatGPT Data Exfiltration Flaw and Codex GitHub Token Vulnerability
10.
PerplexedBrowser: Perplexity’s Agent Browser Can Leak Your PC&#x27;s Local Files
Zenity Labs
11.
How Prompt Injection Attacks Compromise AI Agents in 2026
Atlan
12.
Cline CLI 2.3.0 Supply Chain Attack Installed OpenClaw on Developer Systems
13.
10K Claude Desktop Users Exposed by Zero-Click Vulnerability | eSecurity Planet
eSecurityPlanet
14.
SANDWORM_MODE: npm Supply Chain Attack Targeting AI Development Tools | Hive Pro
Hive Pro
15.
https://arxiv.org/pdf/2512.16962
16.
Second-order prompt injection can turn AI into a malicious insider | TechRadar
TechRadar
17.
On-device Apple Intelligence vulnerable to prompt injection
AppleInsider
18.
Hackers Hijacked Google’s Gemini AI With a Poisoned Calendar Invite to Take Over a Smart Home | WIRED
WIRED
19.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6372438
20.
https://the-decoder.com/google-deepmind-study-exposes-six-traps-that-can-easily-hijack-autonomous-ai-agents-in-the-wild/
21.
https://www.securityweek.com/google-deepmind-researchers-map-web-attacks-against-ai-agents/
22.
https://hivesecurity.gitlab.io/blog/ai-agent-traps-manipulation-taxonomy-2026/
23.
https://hivesecurity.gitlab.io/blog/ai-agent-traps-manipulation-taxonomy-2026/
24.
https://hivesecurity.gitlab.io/blog/ai-agent-traps-manipulation-taxonomy-2026/
25.
https://the-decoder.com/google-deepmind-study-exposes-six-traps-that-can-easily-hijack-autonomous-ai-agents-in-the-wild/
26.
https://hivesecurity.gitlab.io/blog/ai-agent-traps-manipulation-taxonomy-2026/
27.
https://hivesecurity.gitlab.io/blog/ai-agent-traps-manipulation-taxonomy-2026/
28.
https://hivesecurity.gitlab.io/blog/ai-agent-traps-manipulation-taxonomy-2026/
29.
https://hivesecurity.gitlab.io/blog/ai-agent-traps-manipulation-taxonomy-2026/
30.
https://hivesecurity.gitlab.io/blog/ai-agent-traps-manipulation-taxonomy-2026/
31.
https://the-decoder.com/google-deepmind-study-exposes-six-traps-that-can-easily-hijack-autonomous-ai-agents-in-the-wild/
32.
https://hivesecurity.gitlab.io/blog/ai-agent-traps-manipulation-taxonomy-2026/
33.
https://openclawai.io/blog/google-deepmind-ai-agent-traps-six-attack-categories
Cem Dilmegani
Cem Dilmegani
Leitender Analyst
Cem ist seit 2017 leitender Analyst bei AIMultiple. AIMultiple informiert monatlich Hunderttausende von Unternehmen (laut similarWeb), darunter 55 % der Fortune 500. Cems Arbeit wurde von führenden globalen Publikationen wie Business Insider, Forbes und der Washington Post, von globalen Unternehmen wie Deloitte und HPE sowie von NGOs wie dem Weltwirtschaftsforum und supranationalen Organisationen wie der Europäischen Kommission zitiert. Weitere namhafte Unternehmen und Ressourcen, die AIMultiple referenziert haben, finden Sie hier. Im Laufe seiner Karriere war Cem als Technologieberater, Technologieeinkäufer und Technologieunternehmer tätig. Über ein Jahrzehnt lang beriet er Unternehmen bei McKinsey & Company und Altman Solon in ihren Technologieentscheidungen. Er veröffentlichte außerdem einen McKinsey-Bericht zur Digitalisierung. Bei einem Telekommunikationsunternehmen leitete er die Technologiestrategie und -beschaffung und berichtete direkt an den CEO. Darüber hinaus verantwortete er das kommerzielle Wachstum des Deep-Tech-Unternehmens Hypatos, das innerhalb von zwei Jahren von null auf einen siebenstelligen jährlichen wiederkehrenden Umsatz und eine neunstellige Unternehmensbewertung kam. Cems Arbeit bei Hypatos wurde von führenden Technologiepublikationen wie TechCrunch und Business Insider gewürdigt. Er ist ein gefragter Redner auf internationalen Technologiekonferenzen. Cem absolvierte sein Studium der Informatik an der Bogazici-Universität und besitzt einen MBA der Columbia Business School.
Vollständiges Profil anzeigen

Seien Sie der Erste, der kommentiert

Ihre E-Mail-Adresse wird nicht veröffentlicht. Alle Felder sind erforderlich.

0/450