Bottom line: Legitimate AI agents inherently satisfy all three criteria of the “lethal trifecta” (data access, external content, external communication), so security must shift from architectural design to runtime monitoring.

As AI agents are increasingly equipped with data access, external input processing, and communication capabilities, architectural safeguards alone are no longer sufficient. CTOs must implement detection mechanisms at runtime to recognize prompt injection attacks.

Simon Willison, the engineer who coined the term “prompt injection,” warned in June 2025 of the “lethal trifecta”: three capabilities that, combined in an AI agent, form a nearly guaranteed attack vector through indirect prompt injection. The trifecta consists of access to private data, processing of untrusted content, and the ability to communicate externally. Willison documented this attack class with a long list of productive exploits: Microsoft 365 Copilot, GitHub MCP Server, GitLab Duo, Slack AI, Google Bard, and Amazon Q.

Previously, the trifecta could be used as a risk signal because agents were typically narrowly focused. Agents that satisfied only one or two of the capabilities could be assessed as lower risk. This window has closed: a customer-facing support agent reads ticket histories and customer data, processes user reports and files, and calls CRM, refund, and ticketing APIs. An email agent reads mailbox and calendar, processes incoming messages from third parties, and sends responses on behalf of the user. These are not misconfigured exceptions, but the agents that enterprises and individuals actually want and that vendors build.

Ross McKerchar, CISO at Sophos, noted in May 2025 that the capabilities practitioners actually want—reading data, understanding external context, taking action—inevitably lead into dangerous territory. This is not misconfiguration, but the architectural cost of usefulness. An agent without data access is useless, one that cannot process external content is isolated, and one that cannot communicate externally is inert. Removing any leg of the trifecta means building something closer to a search box than an agent.

If every legitimate agent architecture exhibits all three trifecta properties, the trifecta is no longer a meaningful risk indicator—it is the standard configuration. Treating it as a red flag is equivalent to treating DNS resolution as a sign of compromise: technically true in some threat models, but universally present in every real deployment. Meta’s Security Team published in October 2025 the “Rule of Two,” a framework recommending that agents satisfy at most two of the three trifecta properties per session, with human approval required if all three are necessary. Willison himself endorsed the framework as “the most practical advice for secure LLM-based agent systems today.” However, Meta’s Limitations section acknowledges that many desired use cases do not fit neatly into the framework, and that “designs satisfying the Rule of Two can still be error-prone.” This confirms that the problem has outgrown architectural solutions.

The scope of the threat is no longer theoretical. Google’s April 2026 crawl of the Common Crawl repository found prompt injection attempts on public websites, ranging from jokes to data exfiltration payloads, with malicious attempts rising 32 percent between November 2025 and February 2026. The response can no longer reside at the design level—it must necessarily shift to runtime detection and mitigation.

Source: www.csoonline.com · Published June 15, 2026
Lumi AI News — AI-assisted curation in accordance with Art. 50 EU AI Act. Paraphrase and classification by Lumi News Pipeline v1.7.1.

Share on:

Runtime Signals for Detecting Compromised AI Agents

Lumi AI News

Legal

Topics