A Practical Starting Point for AI Governance

A customer described a situation to me recently which I also observed in a lot of organisations right now, even if it doesn’t always get named out loud.

Inside their organization, two camps had formed. One team wanted to bring AI in quickly, get tools into people’s hands, automate real work, start seeing results. The other team wanted to slow down and put governance in place first, on the reasonable grounds that you shouldn’t let software take consequential actions without rules around it.

Here’s where it got stuck. The governance team agreed they needed a framework before anything went live, but they didn’t actually know what should go into that framework. What does “AI governance” mean in concrete terms? What do you check? What do you write down? Nobody could say. This is where the whole organisation froze. The fast-movers were blocked, the cautious team couldn’t produce the thing that would unblock them, and AI adoption stalled across the board.

It’s a classic chicken-and-egg problem. You can’t adopt safely without governance, but you can’t write the governance until you understand what you’re governing. Both teams are right, and that’s exactly why it’s so hard to break.

I’ve been thinking about this deadlock a lot, and I recently came across a research paper that offers one of the more genuinely fresh ways out that I’ve seen. It’s called “Governing Actions, Not Agents“ by Jakob Salfeld-Nebgen. It doesn’t solve the entire governance puzzle but it reframes the problem in a way that makes a big chunk of it suddenly tractable. That reframing is what I want to walk through here, because I think it’s a real innovation.

The Usual Approach

When people imagine governing an AI system, they usually picture watching what it’s thinking, monitoring its reasoning, inspecting its plans, trying to catch bad intentions before they turn into bad outcomes. Most current tools work roughly this way. They sit alongside the AI, watch the actions it tries to take, and block the ones that look dangerous.

This is useful, but it has a built-in weakness. Watching an AI’s behaviour tells you what it’s doing, not whether what it’s doing is actually safe in the real world. Consider an AI agent that’s about to deploy software to production. The action itself looks completely normal, e.g., it’s calling the right tool, with sensible-looking settings. Nothing about the action looks wrong. But whether it’s safe depends on facts that live outside the AI entirely: Did the tests pass? Did a human review the code? Did the security scan come back clean?

The AI’s own behaviour doesn’t contain those answers. So a system that only watches behaviour is looking in the wrong place. The same is true in other domains. An AI suggesting a prescription might be using the right tool with plausible details, but whether it’s safe depends on whether the patient’s records were checked, whether there’s a dangerous drug interaction, and whether the prescriber is actually licensed. None of that is visible in the AI’s reasoning. It lives in other systems.

This is the gap the paper is built around. And it’s a gap that most safety approaches simply can’t reach, because the information they’d need isn’t in front of them.

The Core Idea

Here’s the move that makes the paper interesting. Instead of inventing a brand-new way to govern AI, it looks at how human institutions have governed powerful, independent decision-makers for centuries, e.g., doctors, judges, financial officers, and asks what they actually do.

The answer is striking: we don’t govern these people by reading their minds. A doctor is free to think whatever they like and reach their own conclusions. We govern them at the moment of consequential action by demanding evidence. To prescribe a controlled medication, a doctor needs a verified patient record, a check for dangerous drug interactions, and a valid licence. Each of those is confirmed by a different, independent source. The pharmacy system, the licensing board, the records database. No single party, not even the doctor, gets to wave the action through on their own say-so.

The paper’s central insight is that this same pattern can be turned into a model for governing AI. Don’t try to police the AI’s reasoning. Let it think and plan freely. But at the point where it tries to do something irreversible and consequential, require independently verified evidence that the action is safe before it’s allowed to happen.

That shifts from governing the agent to governing the action. It’s the heart of the innovation. It sounds simple once stated, but it quietly dissolves the hardest part of the problem. You no longer need to peer inside an AI’s mind to govern it well. You just need to be clear about which actions matter, and what evidence each one requires.

How it Works in Practice

The paper lays out a working model for this, and it’s worth understanding the shape of it in plain terms.

The key design choice is that the AI holds no power to carry out a governed action by itself. In a normal setup, the AI has the keys. It can deploy the code or issue the order directly. In this model, those keys are taken away for the high-stakes actions. The AI becomes something more like a courier. It can go and gather the evidence, but it can’t unlock the door on its own.

Roughly, the flow goes like this. The AI announces what it wants to do. A central checkpoint gives back a unique ticket for that specific request and a list of the evidence required. The AI then goes out to the relevant independent sources and collects a signed confirmation from each. The testing service confirms the tests passed. The review service confirms a human approved it. And so on. The AI brings all of this back to the checkpoint, which verifies that each confirmation is genuine and that it hasn’t expired. Only if everything checks out against a clear, pre-written rule does the action go ahead. And every decision is written into a permanent record that anyone can later inspect and re-verify.

A few things make this clever rather than just bureaucratic:

The AI can’t fake the evidence. It’s gathering confirmations that are cryptographically signed by other parties, and it doesn’t hold their signing keys. It can carry the evidence, but it can’t forge it.
The evidence is tied to one specific request. A confirmation collected for one action can’t be quietly reused to wave through a different one. This stops the digital equivalent of recycling an old approval.
The evidence has a use-by date. Because the real world changes. The action has to be backed by fresh evidence, not something gathered weeks ago.
The whole thing is independently checkable. Because every decision and every piece of evidence is recorded in a tamper-evident log, an outside auditor can reconstruct exactly why an action was allowed, without having to take anyone’s word for it.

Back to the Problem

Remember the two stuck teams: one wanting speed, one wanting governance, neither able to move.

What this model offers is a way to stop treating those as opposites. The fast team’s real desire isn’t to be reckless; the governance team’s real fear isn’t AI in general. The “govern the action, not the agent” reframe gives each of them most of what they want. The AI can be adopted quickly and operate freely for the vast majority of its work, while a defined set of consequential actions get gated behind independent evidence.

It also gives the governance team a concrete answer to the question that had them frozen: what goes in the framework? Instead of trying to write rules for an AI’s entire inner life, the task becomes far more bounded and answerable. Make a list of the actions that are genuinely high-stakes and hard to undo. For each one, decide what evidence would have to be true for it to be safe. Decide who the trustworthy source of each piece of evidence is. Write the rule down plainly. That’s a framework a team can actually start building, instead of an abstract ideal they can’t get traction on.

I want to be fair to the limits here, because the paper shared the same point of view. This model doesn’t decide which actions are high-stakes. That’s a judgement each organisation has to make, and it’s the real starting work. It doesn’t guarantee the rule you wrote is the right rule. And it can’t catch an AI that pursues a bad goal through a series of individually legitimate, properly evidenced steps. It’s not the whole of AI governance. But it is a clear, buildable answer for one of the hardest and highest-stakes corners of it and a far better starting point than a blank page.

If an organisation is caught in a version of this deadlock, the question worth putting on the table at your next meeting isn’t “should we govern AI or adopt it?” It’s the more useful one this paper points to: which of the actions our AI might take are the ones we can’t take back, and what would we need to be true before we let it take them?

Answer that, and you’ve started building your framework.