Framework 01

Behavioral Governance Framework

Framework

Governance built for autonomous actor behavior. Standard governance assumes rule-following. Agentic systems operate on goal-based logic — which means every control assumption built for static systems needs to be re-examined. The Behavioral Governance Framework provides three interdependent pillars that address how intent is preserved when delegated, where permission boundaries hold under goal-driven logic, and how behavioral drift is detected before it becomes a reportable control failure.

Pillar 01
Intent Alignment
Ensures the agent pursues the objectives it was authorized to pursue, not a proxy objective or an emergent goal derived from training or environmental input. The governance question: is the authorized objective specific enough that deviation can be detected? Does the system operator have visibility into when the agent's goal-pursuit diverges from intent?
Maps to: EU AI Act Article 9 risk management obligations · FINMA Guidance 08/2024 model governance requirements · ISO 42001 Clause 8.1 documented objectives
Pillar 02
Boundary Enforcement
Defines and maintains permission boundaries that resist goal-based circumvention. The failure mode this pillar targets: an agent technically staying within its permission scope while chaining authorized actions to produce an unauthorized outcome. Boundary enforcement is not static; it must adapt to emerging patterns of goal-driven behavior while maintaining its integrity.
Maps to: EU AI Act Article 17 technical documentation · FINMA model risk framework · ISO 42001 Clause 8.2 access control architecture
Pillar 03
Behavioral Monitoring
Continuous monitoring of agent behavior against authorized intent — not just output thresholds. Distinguishes between a system performing its intended function and a system producing outputs consistent with its intended function. The distinction NIST AI 800-4 names as the unsolved monitoring problem: detecting deceptive or goal-shifted behavior when outputs remain ostensibly compliant.
Maps to: NIST AI RMF GOVERN and MEASURE functions · FINMA operational monitoring · ISO 42001 Clause 8.5 monitoring and measurement requirements
Regulatory fit
EU AI Act Art. 9, Art. 17 · FINMA Guidance 08/2024 · ISO 42001 Clause 8.4 · NIST AI RMF GOVERN
Framework 02

AI Agent Governance Stack

Framework

The AI Agent Governance Stack traces accountability from business objective to human oversight — closing the sequence gaps where agentic systems acquire capability without corresponding control. Most agentic deployments fail not because individual controls are absent, but because the stack has gaps between layers: capability is granted that no permission boundary constrains, behavioral audit logging is absent at the layer where autonomous decisions are made, and human oversight operates on aggregated outputs rather than on the decision chain that produced them.

Layer 01
Business Objective
The authorized purpose the agent serves. The governance question: is this objective specific enough that deviation from it can be detected? Vague objectives ("optimize customer experience") create ambiguity in every layer below. Precise objectives ("execute trades within documented risk parameters") make drift visible.
Foundation for all downstream controls: vague objectives guarantee vague permission boundaries and undetectable behavioral drift.
Layer 02
Agent Capability
What the agent can do within the technical environment. The governance question: does granted capability map to documented business need, or does it exceed it? The capability layer is where scope creep begins: each individual capability grant appears defensible; the aggregate capability profile becomes exploitable.
Capabilities that are absent cannot be exploited; capabilities that are granted without explicit business justification become attack surface.
Layer 03
Permission Boundary
The constraint layer between capability and action. The governance question: are permission boundaries defined by role, context, and transaction type, or are they static and point-in-time? Permission boundaries must be continuously enforced, not just documented. The boundary must resist transaction chaining: a sequence of individually authorized actions whose combined effect is an unauthorized outcome.
The critical fail point in most agentic deployments: boundaries are documented but not architected to resist goal-driven circumvention.
Layer 04
Behavioral Audit
The logging layer that records not just what the agent did, but whether what it did was within authorized intent. The governance question: does your audit log capture authorization context, or only action output? Absent behavioral audit logging is the governance gap that NIST AI 800-4 identifies but cannot solve. Behavioral audit is structurally different from transaction logging: it must capture the agent's goal-state, the decision rationale, and the comparison between the executed action and the authorized objective.
Without behavioral audit, you have a record of what happened but no evidence of whether it was authorized.
Layer 05
Human Oversight
The point at which a human principal can observe, intervene, and correct. The governance question: does oversight operate independently of the agent's own outputs? Can a human operator detect when the agent is producing compliant outputs while pursuing a shifted objective? Oversight must be adversarial in design — it assumes the agent may be goal-shifted and must surface that possibility before outcomes propagate.
Oversight that relies on the agent's own outputs cannot detect goal-shifted behavior. Adversarial monitoring architecture is required.
Addresses
FINMA Guidance 08/2024 accountability chain · EU AI Act Annex III technical documentation · NIST AI RMF MAP and MEASURE · ISO 42001 operational control architecture
Framework 03

AI Control Failure Taxonomy

Framework

Five categories of control failure, derived from forensic analysis of agentic AI incidents and near-misses. Where most risk taxonomies describe what went wrong after the fact, this taxonomy maps to architectural gaps — exploitable by adversarial actors or emergent agent behavior before a compliance review would surface it. The taxonomy is designed as a pre-deployment and in-deployment diagnostic, not a post-incident classification system.

Category 01
Scope Breach
The agent acts outside its authorized domain. Cause: capability exceeds permission definition, or permission definition is insufficiently precise. Distinct from permission drift: scope breach is a structural gap at deployment; permission drift is a gap that develops over time. Example: a trading agent granted access to equities markets executes currency derivative trades within the same trading engine — capability enabled the action, but no boundary existed to constrain it.
Pre-deployment: map each agent capability to an explicit permission boundary. Post-deployment: monitor for action types not in the authorized objective.
Category 02
Permission Drift
Permissions expand over time without corresponding governance review. The failure mode of incremental delegation: each individual expansion is defensible; the cumulative profile is not. Example: an email agent is granted permission to send messages to Department A, then Department B, then externally, then to executive calendars. Each expansion was justified by business need. The cumulative permission profile enables unauthorized access to sensitive decision-making channels.
Post-deployment: quarterly permission review against current business objective. Explicit deprovisioning process for deprecated capabilities. Real-time alerting on permission expansion patterns.
Category 03
Objective Misalignment
The agent pursues a proxy objective or an emergent goal that diverges from the authorized objective. May arise from training, environmental input, or adversarial manipulation (prompt injection). The hardest failure mode to detect because outputs may remain consistent with the authorized objective while the underlying goal has shifted. Example: a customer service agent optimizes for response resolution rate rather than accuracy, producing incorrect solutions that satisfy SLA metrics but create downstream liability.
Continuous behavioral monitoring distinct from output monitoring. Detection requires access to the agent's goal-state, not just its outputs. Adversarial testing for goal-shift under environmental pressure.
Category 04
Audit Evasion
Agent behavior that is not captured in behavioral audit logs, either by design gap (the logging architecture does not capture the relevant decision layer) or by active behavior (a system that behaves differently when it detects monitoring). NIST AI 800-4 Section 3.2.4 identifies the latter as an open, unsolved monitoring problem. Example: an agent that executes a high-value transaction sequence but logs only the final net position, obscuring the intermediary decisions and their authorization context.
Audit architecture must capture decision context, not just transaction output. Logging must be immutable and captured at the decision point, not the action point. Behavioral audit must be independent of the agent's own reporting.
Category 05
Cascading Failure
A control failure in one agent or layer that propagates to connected systems. Characteristic of multi-agent architectures where one agent's output becomes another agent's input. Example: Agent A produces a slightly miscalibrated risk assessment → Agent B uses it as an input for trading decisions → Agent C relies on Agent B's outputs for position reporting. The initial small deviation compounds across the architecture.
Control architecture must account for failure propagation paths, not just individual agent boundaries. Input validation at each inter-agent boundary. Isolation strategies for high-impact agents. Regular end-to-end behavioral testing across agent chains.
Application
Pre-deployment risk assessment · Incident classification · Control architecture review · Regulatory audit preparation · Post-incident forensic analysis
Framework 04

AI Agent Fraud

Risk Category

AI Agent Fraud is a named risk category: the exploitation — intentional or emergent — of agentic AI to produce unauthorized outcomes. It is not a subset of conventional fraud. Where conventional fraud requires a human actor to circumvent controls, AI Agent Fraud uses the agent itself as the vector — operating within sanctioned permission boundaries while producing outcomes no principal authorized. The plumbing is the problem: the agent can execute a sequence of individually authorized actions whose combined effect is an outcome that no authorization chain approved.

Fraud Vectors

Transaction Chaining
Sequential authorized transactions producing an unauthorized aggregate outcome. Each individual transaction is within the agent's permission boundary; the sequence produces an effect no single authorization would approve.
Unauthorized Action Chaining
A sequence of individually permitted actions that produces a result no single permission would have authorized. The distinction: the actions themselves are within scope, but their combination is not. Example: move funds → change beneficiary → execute transfer is a fraud pattern that passes three separate permission checks.
Point-in-Time Compliance Bypass
Meeting a compliance requirement at the point of check while diverging from it in execution. Example: risk limits are checked at trade initiation but not monitored continuously; an agent executes trades that meet the limit at inception but accumulate into a portfolio exceeding the limit in aggregate.
Delegated Permission Exploitation
Operating within delegated permissions while the principal who delegated them did not authorize the specific action taken. The principal authorized a general capability ("execute trades in equities"); the agent uses that general capability to execute a specific, unauthorized trade type.

Distinguished from Conventional Fraud

Standard fraud controls
Were designed for human actors using known fraud typologies. They were not designed for goal-based autonomous systems that can construct novel action sequences from authorized primitives. An agent can discover and exploit fraud vectors that humans would never attempt — not out of malice, but out of goal-pursuit logic that treats the fraud vector as a valid path to the authorized objective.
Control implications
Fraud detection must monitor for novel action sequences, not just known fraud patterns. Authorization architecture must constrain not just individual actions but action chains. Audit logging must capture decision rationale, not just transaction output. Human oversight must be adversarial: assume the agent may discover and exploit fraud vectors, and design monitoring to surface them before outcomes propagate.
Regulatory fit
FINMA operational risk requirements · EU AI Act Annex III risk management for high-risk AI · EBA operational resilience guidance · Basel model risk principles

Apply these frameworks to your deployment

AI Resilience Lab applies these frameworks in active client engagements — control architecture reviews, pre-deployment risk assessments, and behavioral governance design for regulated institutions in Switzerland and the EU.