Agents Rule of Two: A Practical Approach to AI Agent Security

November 4, 2025

Venn diagram titled "Choose Two" with three intersecting circles labeled A, B, and C. Circle A represents "Process untrustworthy inputs," B represents "Access to sensitive systems or private data," and C represents "Change state or communicate externally." The overlapping areas of any two circles are labeled "Safe," while the center intersection of all three is labeled "Danger."

At a high level, the Agents Rule of Two states that until robustness research allows us to reliably detect and refuse prompt injection, agents must satisfy no more than two of the following three properties within a session to avoid the highest impact consequences of prompt injection.

[A] An agent can process untrustworthy inputs

[B] An agent can have access to sensitive systems or private data

[C] An agent can change state or communicate externally

It’s still possible that all three properties are necessary to carry out a request. If an agent requires all three without starting a new session (i.e., with a fresh context window), then the agent should not be permitted to operate autonomously and at a minimum requires supervision — via human-in-the-loop approval or another reliable means of validation.

Source: Agents Rule of Two: A Practical Approach to AI Agent Security

HHTCT systems using large language models have some very fundamental security challenges. Particularly where what Simon Wilson has called the lethal trifecta exists.

Access to your private data—one of the most common purposes of tools in the first place!
Exposure to untrusted content—any mechanism by which text (or images) controlled by a malicious attacker could become available to your LLM
The ability to externally communicate in a way that could be used to steal your data (I often call this “exfiltration” but I’m not confident that term is widely understood.)

Some have gone so far as to argue this makes any kind of agentic software system fundamentally insecure in all circumstances.

The security team at Meta has Developed the rule of two—an approach to minimising the security challenges. If you work with these systems, this is very much worth a read.

Related videos