Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
199 changes: 199 additions & 0 deletions blog/policy-drift-in-ai-agents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
---
title: "Policy Drift in AI Agents"
date: 2026-05-07
author: Albert Mavashev
tags:
- security
- governance
- agents
- runtime-authority
- production
description: "AI agent policies drift when prompts, skills, tools, models, and workflows change. Static approval needs shadow mode, runtime limits, and audit loops."
blog: true
sidebar: false
featured: false
head:
- - meta
- name: keywords
content: AI agent policy drift, agent governance, AI agent change management, runtime authority, shadow mode, agent audit, AI policy enforcement
---

# Policy Drift in AI Agents

The agent passed review in March.

It had a narrow prompt, two approved tools, a read-only CRM integration, and a run budget that matched the expected workflow. Security signed off. Product shipped it. Support started using it.

By May, the agent is not the same system. The prompt has been updated five times. The CRM tool gained a write endpoint. A new Slack skill was installed. The model changed. The retry policy was tuned. A child-agent path was added for hard cases.

No single change looked risky enough to trigger a new architecture review. Together, they changed what the agent can do.

That is policy drift.

Policy drift happens when the approved governance envelope and the live behavior of an agent system slowly diverge. It is not always malicious. It is often the natural result of iteration.

The fix is not to stop changing agents. The fix is to make policy a runtime property, not only an approval artifact.

## Why Agents Drift Faster Than Applications

Traditional applications drift too. Dependencies update. Config changes. Feature flags move. Permissions expand.

Agents add more change surfaces:

| Drift surface | Example |
|---|---|
| Prompt instructions | "Escalate faster" becomes "send customer update automatically" |
| Tool descriptors | A read tool gains a write operation |
| Skills | A reusable workflow starts chaining more tools |
| Model behavior | A model upgrade changes tool-selection patterns |
| Memory | Prior outcomes influence future actions |
| Delegation | Parent agent starts spawning child agents |
| Retry logic | A harmless retry becomes a loop |
| Scope mapping | A workflow starts charging the wrong [tenant](/glossary#tenant) or run |

That makes static review necessary but insufficient. The review captures one snapshot. The agent keeps moving.

## Approved Does Not Mean Still Within Bounds

Approval usually means the agent was acceptable under known assumptions.

| Approval assumption | Drift that breaks it |
|---|---|
| Toolset is read-only | Tool now includes mutation endpoints |
| Prompt avoids side effects | Prompt now asks the agent to notify users |
| Budget is enough for expected runs | Retry policy doubles average spend |
| Agent works in one tenant | Reused in a higher-risk customer segment |
| Child agents are not used | Delegation added for complex cases |
| Human reviews before action | Automation path bypasses the review step |

The problem is not that the approval was wrong. The problem is that approval has a half-life.

For agents, the half-life can be short because behavior is distributed across code, prompts, tools, skills, retrieval, model versions, and runtime context.

## Drift Is an Observability and Enforcement Problem

You cannot manage drift with one control.

Observability tells you what changed or what happened. Enforcement decides whether the next action is still allowed. Audit connects the two into evidence.

| Layer | Drift question |
|---|---|
| Inventory | Which agent, skill, tool, or model version changed? |
| Observability | How did behavior differ after the change? |
| Shadow mode | What would the new policy have denied? |
| [Runtime authority](/glossary#runtime-authority) | Should this action proceed now? |
| Audit | Why was the action allowed, capped, or denied? |

The important point is sequencing. Drift often becomes visible after a change, but side effects happen at runtime. If enforcement only happens during deployment review, the system can detect drift after it has already acted.

Runtime authority puts a decision point in front of each consequential action.

## Common Drift Signals

Useful drift detection starts with plain operational signals.

| Signal | What it may indicate |
|---|---|
| More [reservations](/glossary#reservation) per run | Looping, retries, or new tool chains |
| More [RISK_POINTS](/glossary#risk-points) per workflow | Higher side-effect [exposure](/glossary#exposure) |
| More ALLOW_WITH_CAPS decisions | Policy pressure before hard denial |
| Higher estimate drift | Model or prompt behavior changed |
| New toolset usage | Tool descriptor, skill, or prompt changed |
| New child-agent scopes | Delegation path added |
| Denials clustered by tenant | Tenant-specific workflow or data issue |
| Webhook/audit gaps | Control-plane drift or governance telemetry degraded |

None of these signals proves a breach. They show that the agent should get a closer look.

That is why drift handling should be risk-based. A 5 percent increase in read-only token use is different from a new production mutation path.

## Shadow Mode Is the Drift Lab

Shadow mode is useful for initial rollout, but it is also useful after change.

When a prompt, model, skill, tool, retry policy, or budget policy changes, replay or observe live traffic against the proposed enforcement envelope before making it blocking. The question is not "does the test suite pass?" The question is "what would production traffic do under this policy?"

Shadow mode answers:

- which actions would now be denied
- which scopes would hit limits first
- whether estimates are still calibrated
- whether ALLOW_WITH_CAPS would preserve useful work
- whether one tenant or workflow is driving the drift

That gives teams a way to tune policy with live behavior before turning every mismatch into a user-facing failure.

For the protocol mechanics, see [Dry Run and Shadow Mode Evaluation](/protocol/dry-run-shadow-mode-evaluation-in-cycles) and [Shadow Mode in Cycles](/how-to/shadow-mode-in-cycles-how-to-roll-out-budget-enforcement-without-breaking-production).

## Runtime Limits Make Drift Bounded

Drift will still happen. Runtime limits make it bounded.

If a prompt update makes the agent call search ten more times, a run budget caps the damage. If a skill update starts sending Slack messages, a toolset [RISK_POINTS](/glossary#risk-points) budget limits side effects. If a child-agent path appears, scoped delegation prevents the child from inheriting the whole parent envelope.

The useful pattern is:

```text
Change happens
-> shadow mode detects would-deny and cap pressure
-> runtime authority bounds live exposure
-> audit shows which policy decision applied
-> review updates the approved envelope or rolls back the change
```

This is not a replacement for change management. It is change management with a runtime backstop.

## Drift Review Checklist

A practical drift review can be short:

1. **What changed?** Prompt, model, tool descriptor, skill, code, retry policy, scope mapping, or data source.
2. **Which action classes changed?** Read, write, external send, deploy, payment, delete, or delegate.
3. **Which scopes moved?** Tenant, workspace, workflow, run, agent, or toolset.
4. **Which budgets changed pressure?** [USD_MICROCENTS](/glossary#usd-microcents), [TOKENS](/glossary#tokens), [CREDITS](/glossary#credits), or RISK_POINTS.
5. **Which decisions changed?** ALLOW, ALLOW_WITH_CAPS, DENY, or denial reason.
6. **Which audit trail proves it?** Reservation, commit, event, webhook, and correlation ID records.
7. **What is the rollback?** Revert prompt, pin skill version, remove tool, freeze budget, or suspend agent.

The checklist matters because drift is often cross-functional. Security sees permissions. Finance sees spend. Product sees workflow completion. SRE sees retries and latency. Compliance sees audit evidence. Runtime authority creates a shared decision record across those views.

## Where Existing Security Guidance Fits

OWASP's agentic risk framework points to risks that show up as drift in production: goal hijack, tool misuse, identity and privilege abuse, supply-chain vulnerabilities, cascading failures, and rogue agent behavior.

MCP security guidance covers related tool-connectivity risks: token passthrough, session hijacking, SSRF, local server compromise, and scope minimization. The OWASP Agentic Skills checklist adds another practical point: natural-language instructions, examples, YAML, and scripts can all change behavior.

The common theme is that agent behavior is not controlled by one file or one policy.

That is why drift review should include:

| Artifact | Why it matters |
|---|---|
| Prompt and system instructions | Can change intent and tool use |
| Skill instructions and scripts | Can change multi-step behavior |
| [MCP server](/glossary#mcp-server) and tool descriptors | Can change reachable operations |
| Credentials and scopes | Can expand blast radius |
| Model version and settings | Can change action selection |
| Runtime budgets and RISK_POINTS | Bound cumulative exposure |
| Audit and event delivery | Prove what changed and what ran |

## The Takeaway

AI agents are not static deployments. They are evolving systems made of code, prompts, tools, skills, models, credentials, policies, and runtime context.

That means approval can drift.

The production answer is not to freeze iteration. It is to treat policy as something that must be observed, tested in shadow mode, enforced at runtime, and audited after every consequential action.

Static approval says what the agent was allowed to be. Runtime authority decides what the agent is still allowed to do.

## Sources

- [OWASP Top 10 for Agentic Applications 2026](https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/) - agentic risks including tool misuse, identity abuse, cascading failures, and rogue behavior
- [Model Context Protocol Security Best Practices](https://modelcontextprotocol.io/docs/tutorials/security/security_best_practices) - MCP token, session, SSRF, local-server, and scope guidance
- [OWASP MCP Security Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/MCP_Security_Cheat_Sheet.html) - MCP deployment security guidance
- [OWASP Agentic Skills Security Assessment Checklist](https://owasp.org/www-project-agentic-skills-top-10/checklist.html) - script and natural-language instruction review guidance
- [Shadow Mode in Cycles](/how-to/shadow-mode-in-cycles-how-to-roll-out-budget-enforcement-without-breaking-production) - rollout pattern for observe-only enforcement
- [Estimate Drift: The Silent Killer of Enforcement](/blog/estimate-drift-silent-killer-of-enforcement) - why estimate accuracy affects enforcement quality
- [AI Agent Risk Assessment](/blog/ai-agent-risk-assessment-score-classify-enforce-tool-risk) - scoring action risk before production
- [Webhook Event Delivery Protocol](/protocol/webhook-event-delivery-protocol) - governance signal delivery and audit mechanics
Loading