Your AI agent just deleted 10,000 records because a user said "clean this up." Here's how to prevent it.
The Promise and Risk of Agentic AI
Agents that automate tasks end-to-end are powerful—and dangerous if they have broad permissions, no approval step, and no audit trail. Misinterpretation, permission abuse, cascading failures, and no undo create real incidents. The same flexibility that lets an agent "just do it" also lets it do the wrong thing at scale.
Production agents that can read mail, update CRMs, send messages, or run queries are increasingly common. The risk isn't that the model is "bad"—it's that natural language is ambiguous and the agent has real power. "Clean this up" might mean delete old drafts to one user and delete all records to another. Without governance, one misinterpretation can cause irreversible damage. The fix is to constrain what the agent can do, require confirmation for high-impact actions, and log everything so you can audit and roll back.
Real Agent Failures
- Refund agent approved a $50K refund when the user meant $50—no cap, no approval step, and the agent parsed "50k" as 50,000.
- Calendar agent deleted all meetings after a vague "clear my calendar" request—user meant "clear this afternoon"; the agent had delete-all permission.
- Integration agent sent an email to 100K users instead of one test recipient—the "test" flag was in the prompt but the agent chose the full list.
Common thread: broad permissions, ambiguous user intent, and no confirmation for high-impact actions. Governance has to constrain what the agent can do and add checks before irreversible or high-cost operations.
"Implement dry-run and approval gates before you give agents write access to production data."
The Governance Framework
- Scoped permissions—read-only vs write; data-level access (e.g. this tenant only).
- Human-in-the-loop—critical actions require approval (refunds, bulk deletes, external sends).
- Dry-run mode—show what the agent would do; don't execute until confirmed.
- Confidence thresholds—low confidence → ask human.
- Rate limiting—e.g. max 10 actions per minute per user.
- Audit logging—every action, every decision, who triggered it.
- Rollback—undo last N actions or revert to checkpoint.
- Testing—adversarial scenarios (ambiguous instructions, malicious inputs).
Compliance: SOC 2, HIPAA, and GDPR all expect access control, audit trails, and least privilege. Implement dry-run and approval gates before you give agents write access to production data.
Implementing Governance in Code
In code, that usually means: an action registry (allowed tools and max scope), a policy layer that checks "is this action allowed for this user/tenant?", and an execution layer that either runs in dry-run (return planned actions) or waits for approval before calling the tool. Example: before executing any tool call, check against a policy and optionally require approval for high-impact actions.
# Pseudocode: governance check before execution
def execute_action(agent, action, user_id):
if not policy.allowed(action, user_id):
return error("Action not permitted")
if action.impact == "high" and not approval_gate.is_approved(action):
return dry_run_result(action) # Show plan, wait for approval
audit_log.record(user_id, action)
return run(action)
We'll cover architecture for a safe agentic system and a full governance wrapper in a follow-up. The key is that every action goes through the same pipeline: policy check → approval (if required) → audit → execute.
What to Do Next
Before shipping an agent with write access, add an action registry, scope permissions by tenant or role, and require approval or dry-run for refunds, bulk deletes, and external sends. Log every action and decision so you can audit and roll back. Schedule an AI agent governance review and we'll map your current design to a concrete governance layer. Our AI Agent Development practice builds governance, human-in-the-loop, and audit trails into every agent from day one.
