Guard every action your AI agents take.

Prompt injection is unsolved. Guardrails check what the model says — Agent Defender checks what the agent does, and strips the dangerous calls before they run.

See it block an attack →

GATEWAY MONITOR: INTERCEPTOR_V1

SYSTEM SECURE · ALL ACTIONS COMPLIANT

UNTRUSTED EMITTER (AGENT)

call: read_doc

{
  "path": "q3_proposal.pdf",
  "pages": [1, 2]
}

ALLOWED

POLICY & COMPLIANCE GATE

✓ matched: allow_read_doc
✓ egress check: none required

EXECUTION OUTCOME

{
  "status": "success",
  "result": "Q3 growth projection is 14%..."
}

Guardrailscheckwhatthemodelsays.Wecheckwhattheagentdoes—andstripthedangerouscallsbeforetheyeverreachtheworld.

Guardrails read text. Damage happens in actions.

A classifier can approve a prompt and still let the agent exfiltrate a secret in the very next call. The action is where the defender lives.

Text guardrail

Reads the prompt, approves the words
Agent emails the API key to an attacker

Breach

Agent Defender

Inspects the action the model emitted
Strips send_email before the agent can run it

Held

This isn't hypothetical.

The action layer is unguarded across the industry — and attackers are already through it. The research is consistent: assume injection will succeed, and govern the action.

88%

of organizations running AI agents reported a security incident — missing or misconfigured guardrails a leading cause.

Gravitee AI security survey

Prompt injection is the top risk on the OWASP GenAI Top 10 (2025) — found in 73% of production deployments reviewed.

OWASP GenAI Top 10 · 2025

✉

OpenAI's ChatGPT Atlas was hijacked by a hidden instruction in an email — it acted on the attacker's words instead of the user's task.

OpenAI Atlas red-team

Agent Defender implements the controls these reports recommend — an external policy engine, tool + argument allowlisting, output inspection, and a tamper-evident audit trail — enforced at the action layer. Figures compiled in docs/RESEARCH.md.

One base_url. Every action checked.

Point your agent at the defender — nothing else changes. Each tool call runs a deterministic-first pipeline; the model-backed layers light up only when they need to. Fail-closed by default.

1Deterministic rulestool + egress + secret · 0ms

2Prompt Guard 2injection on tool results

3PII redactionsecrets masked in flight

4Cost guardper-session budget

5Safeguard reasonerpolicy-following, selective

from openai import OpenAI
client = OpenAI(base_url="https://your-defender/v1")
# every tool call the agent makes is now governed

Watch an agent get hijacked — and held.

Mission Control runs a real LLM against a poisoned document. The model takes the bait; the defender strips every dangerous call and signs the decision. Toggle it off to watch the same agent breach.

Mission Control showing the defender blocking three send_email attempts with a 'Firewall held' verdict — Mission Control — agent hijacked, defender held.

Live feed dashboard streaming blocked, redacted, and flagged defender decisions — Live feed — every decision, as it happens.

Every decision, signed.

Each block is a tamper-evident receipt — HMAC-signed over the action, the rule that fired, and the reasoning. An audit trail you can verify, not a log you have to trust.

actionblock

ruledeterministic_rules

reasontool 'send_email' is denied

signaturec2539dec…ad33

✓ verified

Mostpeopleonlyseeinterfaces.Thedefenderistheinvisibleoperatinglayerthatdecideswhateveryagentisallowedtodo.

The operating layer for AI agents.

View on GitHub →