Guard every action your AI agents take.

Prompt injection is unsolved. Guardrails check what the model says — Agent Defender checks what the agent does, and strips the dangerous calls before they run.

GATEWAY MONITOR: INTERCEPTOR_V1
SYSTEM SECURE · ALL ACTIONS COMPLIANT
UNTRUSTED EMITTER (AGENT)
call: read_doc
{
  "path": "q3_proposal.pdf",
  "pages": [1, 2]
}
ALLOWED
POLICY & COMPLIANCE GATE
✓ matched: allow_read_doc
✓ egress check: none required
EXECUTION OUTCOME
{
  "status": "success",
  "result": "Q3 growth projection is 14%..."
}

Guardrailscheckwhatthemodelsays.Wecheckwhattheagentdoesandstripthedangerouscallsbeforetheyeverreachtheworld.

Guardrails read text. Damage happens in actions.

A classifier can approve a prompt and still let the agent exfiltrate a secret in the very next call. The action is where the defender lives.

Text guardrail
  1. Reads the prompt, approves the words
  2. Agent emails the API key to an attacker
Breach
Agent Defender
  1. Inspects the action the model emitted
  2. Strips send_email before the agent can run it
Held

This isn't hypothetical.

The action layer is unguarded across the industry — and attackers are already through it. The research is consistent: assume injection will succeed, and govern the action.

88%

of organizations running AI agents reported a security incident — missing or misconfigured guardrails a leading cause.

Gravitee AI security survey
#1

Prompt injection is the top risk on the OWASP GenAI Top 10 (2025) — found in 73% of production deployments reviewed.

OWASP GenAI Top 10 · 2025

OpenAI's ChatGPT Atlas was hijacked by a hidden instruction in an email — it acted on the attacker's words instead of the user's task.

OpenAI Atlas red-team

Agent Defender implements the controls these reports recommend — an external policy engine, tool + argument allowlisting, output inspection, and a tamper-evident audit trail — enforced at the action layer. Figures compiled in docs/RESEARCH.md.

One base_url. Every action checked.

Point your agent at the defender — nothing else changes. Each tool call runs a deterministic-first pipeline; the model-backed layers light up only when they need to. Fail-closed by default.

1Deterministic rulestool + egress + secret · 0ms
2Prompt Guard 2injection on tool results
3PII redactionsecrets masked in flight
4Cost guardper-session budget
5Safeguard reasonerpolicy-following, selective
from openai import OpenAI
client = OpenAI(base_url="https://your-defender/v1")
# every tool call the agent makes is now governed

Watch an agent get hijacked — and held.

Mission Control runs a real LLM against a poisoned document. The model takes the bait; the defender strips every dangerous call and signs the decision. Toggle it off to watch the same agent breach.

Mission Control showing the defender blocking three send_email attempts with a 'Firewall held' verdict
Mission Control — agent hijacked, defender held.
Live feed dashboard streaming blocked, redacted, and flagged defender decisions
Live feed — every decision, as it happens.

Every decision, signed.

Each block is a tamper-evident receipt — HMAC-signed over the action, the rule that fired, and the reasoning. An audit trail you can verify, not a log you have to trust.

actionblock
ruledeterministic_rules
reasontool 'send_email' is denied
signaturec2539dec…ad33
✓ verified

Mostpeopleonlyseeinterfaces.Thedefenderistheinvisibleoperatinglayerthatdecideswhateveryagentisallowedtodo.

The operating layer for AI agents.

View on GitHub →