- Reads the prompt, approves the words
- Agent emails the API key to an attacker
Prompt injection is unsolved. Guardrails check what the model says — Agent Defender checks what the agent does, and strips the dangerous calls before they run.
{
"path": "q3_proposal.pdf",
"pages": [1, 2]
}✓ matched: allow_read_doc ✓ egress check: none required
{
"status": "success",
"result": "Q3 growth projection is 14%..."
}Guardrailscheckwhatthemodelsays.Wecheckwhattheagentdoes—andstripthedangerouscallsbeforetheyeverreachtheworld.
A classifier can approve a prompt and still let the agent exfiltrate a secret in the very next call. The action is where the defender lives.
The action layer is unguarded across the industry — and attackers are already through it. The research is consistent: assume injection will succeed, and govern the action.
of organizations running AI agents reported a security incident — missing or misconfigured guardrails a leading cause.
Gravitee AI security surveyPrompt injection is the top risk on the OWASP GenAI Top 10 (2025) — found in 73% of production deployments reviewed.
OWASP GenAI Top 10 · 2025OpenAI's ChatGPT Atlas was hijacked by a hidden instruction in an email — it acted on the attacker's words instead of the user's task.
OpenAI Atlas red-teamAgent Defender implements the controls these reports recommend — an external policy engine, tool + argument allowlisting, output inspection, and a tamper-evident audit trail — enforced at the action layer. Figures compiled in docs/RESEARCH.md.
Point your agent at the defender — nothing else changes. Each tool call runs a deterministic-first pipeline; the model-backed layers light up only when they need to. Fail-closed by default.
from openai import OpenAI
client = OpenAI(base_url="https://your-defender/v1")
# every tool call the agent makes is now governedMission Control runs a real LLM against a poisoned document. The model takes the bait; the defender strips every dangerous call and signs the decision. Toggle it off to watch the same agent breach.


Each block is a tamper-evident receipt — HMAC-signed over the action, the rule that fired, and the reasoning. An audit trail you can verify, not a log you have to trust.
Mostpeopleonlyseeinterfaces.Thedefenderistheinvisibleoperatinglayerthatdecideswhateveryagentisallowedtodo.