The Defensibility Gap

The Foresnic Question Your Board Cannot Answer

Your AI system approved a lending decision. Or rejected an insurance claim. Or allocated a portfolio of assets. Three months later, a regulator arrives with a single question:

What did your system know at the moment it decided?

Not what your policy said it should know. Not what the documentation claims. What information was actually in front of the machine at 14:23:17 on the day the decision executed?

Most boards cannot answer this question with evidence. They can produce logs. They can produce audit trails. They can produce testimony about what should have happened. But they cannot reconstruct what actually was, the precise state of knowledge the AI possessed at the microsecond of decision.

This gap is the Defensibility Gap. And it is a fiduciary liability your board is currently carrying but cannot discharge.

The Three Gaps: From Documentation to Control

The Defensibility Gap consists of three distinct failures in how boards oversee artificial intelligence systems. Understanding these three gaps is essential because they determine whether your governance framework will survive the forensic question.

Gap One: The Epistemic Gap: You Don't Know What Your System Knows

Your AI system processes millions of data relationships to reach a conclusion. A medical imaging AI might analyze 2.3 million pixel relationships across three imaging modalities. A compliance engine might extract patterns across 47 regulatory documents updated hourly.

Your human overseer is presented with a summary. "Recommendation: Approve." Or a confidence score. "87% likelihood of default."

The human is not reviewing the decision. They are rubber-stamping an opaque result.

This creates a radical asymmetry: The machine has processed millions of data points and relationships. The human has seen a summary. When the board later asks "what did we know when we decided?", the honest answer is: "What the machine knew, we did not."

Under the EU AI Act's transparency mandate for high-risk systems, this gap is a regulatory breach. Under the FCA's SM&CR framework, the senior manager cannot claim they "understood" a decision they lacked the information to evaluate. Under DORA, you cannot demonstrate "effective oversight" if your oversight was epistemically incomplete.

The board's fiduciary duty includes the obligation to understand what it is approving. If the human in the loop lacks epistemic parity with the machine, if they cannot access the same information the machine accessed, then governance has failed at the conceptual level.

Gap Two: The Temporal Gap, You Cannot Intervene in Real Time

Your AI system makes decisions at millisecond scale. It processes, reasons, decides, and acts in the time it takes a human to read a single sentence.

Your human oversight happens at human scale. A governance meeting. A review queue. A batch approval session.

By the time your human overseer sees the decision, it has already manifested its consequences.

Consider the speed divergence:

Data Integration: AI system processes incoming data in milliseconds. Your human review happens in hours or days.
Reasoning Iteration: AI system iterates its reasoning continuously. Your human reviews the final output.
Action Execution: AI system executes decisions simultaneously across multiple systems. Your human can intervene only after the fact.
Error Detection: AI system propagates errors at machine speed. Your human detects them when they surface in batch logs.

This temporal gap means your "human in the loop" is not actually in the loop. They are an external observer of an autonomous process. They are always operating post-hoc, reviewing outcomes that have already committed.

Under DORA's requirement for real-time resilience, this is a critical failure. If your system can execute harmful decisions faster than your humans can detect and stop them, you have not achieved oversight. You have achieved surveillance, observing what went wrong after it happened.

Gap Three: The Procedural Gap, Your Policies Don't Reach Your Code

Your board approves an ethics framework. "All AI decisions must be subject to human review." "Material decisions require senior manager sign-off." "Risk decisions are escalated above the CFO level."

These policies sit in governance documents. Your engineering team builds the AI system according to technical specifications.

The two rarely intersect.

The result is "governance theater." Policies exist on paper. Control does not exist in the system. A human review occurs, but it happens because the review queue was implemented as a database table, not because the board's policy mandate reached the code layer.

This procedural gap is why many organizations can pass an internal audit (which checks whether policies exist) while simultaneously failing a forensic investigation (which checks whether controls actually function).

The board cannot discharge its liability by approving policies it does not verify are technically implemented. Governance requires that board-level risk appetite is translated into model-level constraints. Without this procedural link, your documentation describes a failure. It does not constitute control.

Why “Human in the Loop” Has Become a Myth

"Human in the Loop" (HITL) was designed for simple automation. A system executes a task. A human reviews the output. If it looks wrong, they stop it.

This model works when the execution is slow, the output is understandable, and the human has epistemic parity with the machine.

None of these conditions hold for agentic AI.

Agentic systems reason autonomously. They make decisions at speeds humans cannot perceive. They access information humans cannot process. They execute actions that may be invisible until their consequences appear.

In this environment, HITL does not provide oversight. It provides a false sense of assurance. The board points to the human and claims "we have oversight." The regulator asks whether that human actually had the information, time, and authority to provide meaningful control. And the answer, in most cases, is no.

There are three reasons why HITL fails as governance for agentic systems:

Failure Mode One: Speed Creates a False Bottleneck

When HITL introduces latency into a high-speed system, organizations instinctively work around it. They create exceptions. "This category of decision doesn't require review." Or they make review perfunctory. "Batch approve the queue every morning."

The human review becomes an administrative checkbox, not a governance control.

This creates legal exposure. If the board later claims "we have human oversight," and it is revealed that the human review was performed in 30 seconds across 500 decisions without actually examining any individual case, the board has not discharged its duty. It has documented its negligence.

The fiduciary consequence is severe: A senior manager cannot claim they "understood" and "approved" a decision if they did not have time to actually understand it.

Failure Mode Two: Knowledge Asymmetry Creates Automation Bias

A human reviewer approves a decision made by a system they do not fully understand. This is called "automation bias" — the tendency to over-trust automated outputs even when they may be flawed.

Automation bias is strongest when the human reviewer lacks the domain expertise to interrogate the system.

Example: A pharmaceutical compliance AI checks new marketing materials against 47 regulatory guidelines, real-time. The human reviewer sees "APPROVED (98% confidence)." They sign off. They lack the expertise to know whether that 98% confidence is justified, or whether the AI missed a critical nuance in a recently updated guideline.

Six months later, the regulator identifies a breach. The human signature on the approval is the only defense. It is also no defense at all — the human lacked the information to make an informed judgment.

This is the core of the Defensibility Gap: documentation of a decision exists, but the ability to defend the decision in a legal or regulatory setting has evaporated. The human's name is on the approval. The human's judgment was absent.

Failure Mode Three: Volume Exceeds Cognitive Capacity

An AI system can generate decisions at a volume no human team can review. In oncology management, imaging analytics, or trading, the machine processes thousands of cases per hour. The human team processes dozens per day.

The organization responds by sampling: "We'll review a statistical sample of decisions each month."

This works for quality control. It does not work for fiduciary governance. If each decision carries legal, ethical, or reputational weight, then sampling is insufficient. You are approving a process you have not fully audited.

Under the EU AI Act, high-risk systems must be subject to "effective oversight by natural persons." Sampling is not effective oversight. It is statistical approximation applied to a governance problem.

The Four Elements of Actual Defensibility

To close the Defensibility Gap, a board must move beyond the HITL myth and implement four non-negotiable elements. These are not policy recommendations. They are technical and governance requirements.

Element One: Active Interrogative Authority, Not Passive Approval

The human must have the authority and the tools to interrogate the AI's logic, not merely view its output.

This means:

The AI system provides a reasoning trace — the specific data points, decision rules, and alternatives the system considered.
The human can challenge the reasoning based on their expertise, knowledge of edge cases, or awareness of context the AI might lack.
The human's rationale for concurring or overriding the AI is documented contemporaneously.

A signature on an approval is worthless if the human did not have access to the machine's reasoning. A documented challenge ("I reviewed this and overrode the system because X") is everything. This is the shift from "Human in the Loop" to "Human in Command."

Element Two: The Contemporaneous Decision Record

Defensibility requires a Decision-State Log that timestamps and records:

The model version and weights in use at the moment of decision
The specific data inputs and their provenance (where they came from, how they were verified)
The confidence intervals and alternative paths the system rejected
The human review (who reviewed, when, what information was available to them, what they actually challenged or approved)

This log must be standardized, tamper-evident (cryptographically sealed), and permanently retrievable. It allows an auditor or regulator to forensically reconstruct: "What did the machine know? What did the human know? What actually happened?"

Without this record, you cannot answer the forensic question. You are carrying liability you cannot defend.

Element Three: Trained, Specialized Oversight Competency

Human reviewers must be specifically trained to identify AI-specific failure modes: hallucinations, data drift, adversarial inputs, prompt injection.

This is not a role for generalist managers. It requires specialized AI curatorship. Organizations must invest in training and retain people who can interrogate machine decisions at a technical level. If your human reviewers cannot identify when an AI system is making an error, they are not providing oversight. They are providing decoration.

This also means creating a culture where human reviewers can flag problems without pressure to maintain "speed to market." The board must actively protect the integrity of the review process.

Element Four: Independent, Third-Party Validation

Regular audits by independent auditors using standardized methodologies are non-negotiable. Many audit firms are auditing AI systems they helped build — a clear conflict of interest.

The board must mandate that AI governance audits are conducted by firms with no development stake in the organization's systems. These audits must verify that:

The HITL process is actually functioning as designed
The Decision-State Logs are being maintained and are tamper-evident
The human reviewers are competent and not operating under pressure
The procedural controls are technically implemented, not just documented

This is not optional. It is the only way to close the Procedural Gap.

The Regulatory Convergence: Three Frameworks, One Standard

Three emerging frameworks are converging to create a new standard of fiduciary accountability for AI. Boards that do not understand these frameworks will discover they are non-compliant only after a regulator arrives.

DORA: Real-Time Resilience, Not Quarterly Compliance

The Digital Operational Resilience Act (DORA) requires financial institutions to ensure they can withstand, respond to, and recover from ICT-related disruptions in real time. This is not a compliance framework. It is a resilience framework.

Under DORA, the Defensibility Gap is a resilience failure. If an AI system makes a harmful decision, and the human oversight failed to detect it because of latency, you have not met DORA's standard. Oversight must operate at decision speed.

The implication: HITL models that rely on batch review, sampling, or after-the-fact analysis do not meet DORA's temporal standard.

SM&CR: Personal Liability for Governance Theater

Under the UK's Senior Managers and Certification Regime, senior managers can be held personally liable for failures in their areas of responsibility. The question is not whether a policy existed. The question is whether the manager took "reasonable steps" to govern.

If a manager relied on a HITL model they knew (or should have known) was insufficient, they may have breached their duty. If they could not answer the forensic question — "what did we know when we decided?" — they cannot claim they took reasonable steps.

The implication: Senior managers are now personally exposed to liability if they do not verify that their AI governance actually works.

EU AI Act: Effective Oversight, Not Human Presence

The EU AI Act is explicit in Article 14: High-risk AI systems must be designed for "effective oversight by natural persons." This includes the ability to "fully understand" the system's capacities and limitations, and to "interrupt" the system.

Effective oversight is not having a human in the process. It is having a human who is capable of command.

The implication: Any organization relying on a HITL model where the human lacks epistemic parity, temporal authority, or procedural integration with the system will be in violation of the EU AI Act.

The Fiduciary Consequence: Undischargeable Liability

The ultimate issue is this: Most boards are currently carrying AI governance liability they cannot discharge.

In traditional corporate law, directors are protected by the Business Judgment Rule if they made informed decisions in good faith. But a decision made by an AI system that the board cannot forensically reconstruct is, by definition, not an "informed" decision.

If the board relied on a HITL framework it knew was structurally inadequate — unable to provide oversight due to speed, knowledge, or scale — the board has breached its duty of care. This is a due diligence deficit that exposes the board to liability for:

Regulatory sanctions (fines, imposed remediation)
Shareholder litigation (breach of fiduciary duty claims)
Reputation damage (public disclosure of governance failure)
Operational consequences (remediation costs, customer losses)

The move toward agentic AI magnifies this risk. Agents do not just execute instructions. They reason and act autonomously. This shift from "instruction-following tools" to "reasoning agents" requires a parallel shift in governance from "process monitoring" to "architectural assurance."

Governance, in the age of agentic AI, is no longer a compliance function. It is a technical architecture function. Boards that do not understand this distinction will find themselves on the wrong side of a forensic investigation.

The Question to Ask This Week

Before your next board meeting, ask this single question:

If a regulator asked us to forensically reconstruct the complete information state behind our most material AI decision — data available, model reasoning, human review, timing, authority — could we produce that evidence within 24 hours?

If the answer is "No," or "We would have to reconstruct it from memory," you have a Defensibility Gap.

If the answer is "Yes, and here is the contemporaneous record," you have governance.

Which answer did your board give?

What’s Next

Next week, we examine the specific temporal standards being introduced under DORA and the FCA SM&CR, and how organizations can move from static compliance documentation to real-time, forensically defensible oversight.

If your board cannot answer the question above, that piece is for you.

* * *

Dr. Ivan Roche FRSS FRSA MInstP
Founder and Principal Advisor · Otopoetic Limited · Belfast