Stay updated

Get notified when we publish new research. No spam, unsubscribe anytime.

Explainer

The AI Authority Problem

Why “who decides?” matters more than “how smart?”

The entire AI industry is trying to answer the wrong question. They ask how to make AI behave. The question that actually determines the next fifty years is who decides what “behave” means — and where that authority lives.

SDGs:

1
The Wrong Question

The industry asks

“How do we make AI behave?”

This leads to: bigger models, more RLHF, more classifiers, more filters, more guardrails. An arms race between capability and constraint where capability always wins.

The real question

“Who decides what 'behave' means?”

This leads to: explicit authority structures, auditable governance, institutional sovereignty, and AI as a capability layer that serves — not defines — institutional intent.

The distinction is not semantic. It determines who holds power in every human-AI relationship for the next century. The first question concentrates authority in AI vendors — whoever trains the model defines acceptable behaviour for everyone who uses it. The second question distributes authority to institutions — each organisation, community, or individual defines their own governance and uses AI as infrastructure.

The difference in one sentence: Post-hoc filtering makes AI vendors the unelected governors of every institution that uses their models. Pre-governance returns authority to the institutions themselves.

2
Two Paradigms

	Post-hoc Filtering	Pre-Governance
Core operation	Generate freely, filter outputs	Bound decision surface, act within it
Where governance lives	Model weights (RLHF) + runtime classifiers	Explicit structures (constraints, traces, precedents)
Scales by	Bigger models, more RLHF, more classifiers	Clearer boundaries, delegated authority
Auditable?	No — weights are opaque, classifiers are proprietary	Yes — every constraint inspectable, every decision traced
Contestable?	No — users cannot challenge RLHF decisions	Yes — escalation, precedent, and forum contestation
Institutional memory	Resets between sessions	Compounds over time
Sovereignty	AI vendor decides acceptable behaviour	Institution defines its own constraints
AI vendor changes policy	Your governance changes without consent	Your governance is unaffected — AI is swappable

Every row in this table represents an authority decision that the AI industry has made implicitly — by defaulting to post-hoc filtering without examining the alternative. The question is not which paradigm is technically superior. The question is which paradigm places authority where you want it to be.

3
Why Post-hoc Filtering Fails at Scale

The Scaling Divergence

As AI capability grows, the output space expands combinatorially. Filters fall behind. Constrained surfaces remain stable.

The Combinatorial Ceiling

As AI capability increases, the space of possible outputs grows combinatorially. Every new capability multiplies the surface area that filters must cover. This is why every major AI lab experiences recurring jailbreak cycles — the output space expands faster than filtering capacity. Making models more capable makes this worse, not better.

Pre-governance response: Pre-governance inverts this: the constrained input space remains bounded regardless of model capability. A more powerful AI inside a well-governed boundary simply makes better decisions within that boundary. It does not escape it.

Institutional Memory Compounds

Post-hoc governance stores institutional knowledge in model weights (RLHF) and runtime filters. Both are opaque, non-auditable, and reset between sessions. There is no precedent, no accumulation, no ability to say 'we decided X in this context and here is why.'

Pre-governance response: Pre-governance stores decisions as explicit structures — constraints, commitments, traces, contestations. A five-year-old governance system has richer institutional memory than a one-day-old one. A five-year-old RLHF model simply has more weight updates that no one can inspect.

Sovereignty Is Non-Negotiable

Post-hoc filtering means the AI vendor decides what 'acceptable' means. If the vendor changes their RLHF policy, your institutional governance changes without your consent. This is not theoretical — it has already happened repeatedly as AI vendors modify model behaviour in response to political, legal, and commercial pressures.

Pre-governance response: Pre-governance makes institutional governance portable and vendor-independent. The institution defines its own constraints. The AI is a capability layer that can be swapped without losing the governance layer. Authority remains where it belongs.

4
“But What If AI Gets Smart Enough to Break Out?”

This is the most common objection, and it contains a category error that reveals the depth of the paradigm difference.

Post-hoc filtering is software trying to constrain software. Classifiers trying to catch outputs from a system that is smarter than the classifiers. This is vulnerable to capability scaling — a smarter model finds adversarial paths around dumber filters. The arms race is real.

Pre-governance is not software constraining software. It is architecture constraining the decision surface. The protocol does not evaluate whether an output is “good” — it defines what actions exist in the first place.

An AI cannot “break out” of not having a tool any more than a human employee can “break out” of not having admin credentials they were never given. There is nothing to circumvent — the capability simply is not in the environment.

The deeper point

The question “will AI break out?” assumes AI has motivation to break out. That is an anthropomorphism. Current AI systems do not have goals — they have completions. Pre-governance is agnostic to AI motivation. It does not need to assume AI is adversarial or aligned. It works at the infrastructure layer, not the behavioural layer. That is why it is more robust than alignment approaches that try to make AI “want” to be good.

5
How It Works in Practice

The pre-governance paradigm is not theoretical. It operates through a governance protocol called Constellation, implemented as an MCP server. Every AI agent connecting to an institution is subject to its governance — automatically, at the moment of action.

1. Bounded Authority

Each AI agent operates within a defined decision surface. The protocol specifies what the agent can decide, not what it should output.

Example: A code-writing agent can modify files but cannot send communications or make financial commitments without governance checks.

2. Check at Moment of Action

Before any consequential action, the agent calls a governance check. This evaluates the action against institutional constraints, precedents, and delegated authorities.

Example: Before publishing content, the AI checks: does this fall within delegated publishing authority? Have similar actions been contested?

3. Record and Trace

After completing a consequential action, the agent records a governance trace — who decided, what, under what authority, when. Traces accumulate into institutional memory.

Example: Every governance decision creates an auditable record that future decisions can reference as precedent.

4. Escalation, Not Failure

When a governance check returns violations or requires approval, the agent escalates to the human operator. It does not fail silently or attempt workarounds.

Example: Constraint violation detected → human reviews → approves/denies → decision becomes precedent for future similar situations.

The human operator never reviews every line of AI-generated output. They review every consequential decision. Code is implementation. Governance is authority. The AI holds implementation authority. The human holds governance authority. At no point do these overlap.

6
The Long Horizon

The paradigm difference becomes more pronounced over time, not less.

1-3 Years

Post-hoc

Works adequately for current AI capability. Jailbreaks are annoying but manageable. Most organisations don't notice the sovereignty issue.

Pre-governance

Requires upfront investment in governance architecture. Pays off in auditability and institutional memory accumulation.

5-10 Years

Post-hoc

Filter coverage gap becomes critical. AI capability outpaces filtering capacity. Vendor lock-in becomes apparent as switching costs are governance costs.

Pre-governance

Institutional memory is now substantial. Governance decisions build on years of precedent. AI vendors are genuinely interchangeable.

20-50 Years

Post-hoc

Institutional governance is entirely dependent on vendor decisions. Organisations have no sovereign governance capacity — it was never built.

Pre-governance

Institutional governance is mature, self-reinforcing, and independent of any specific AI system. Authority has compounded like capital.

The question that governs the next century of human-AI relations:

“Who decides what AI is allowed to do?”

Post-hoc filtering answers: the AI vendor. Pre-governance answers: the institution that the AI serves. Every organisation, every government, and every community will eventually have to choose. The architecture they build now determines which choice is still available to them later.

7
So What

For organisations adopting AI

The question is not “which AI model should we use?” It is “what governance architecture will we wrap around AI action?” Model capability is commoditising. Governance architecture is the durable competitive advantage. Organisations that invest in explicit governance structures now will compound institutional memory that cannot be replicated by latecomers.

For AI policy and regulation

Current regulatory frameworks focus on model-level governance — classifying models by risk, requiring safety testing before deployment. This is necessary but insufficient. Pre-governance operates at the institutional level, between the model and the action. Regulation should incentivise institutional governance architectures, not just model compliance. The audit trail produced by pre-governance provides exactly the transparency that regulators need.

For the AI safety community

Alignment research focuses on making AI “want” to be good — encoding values into model behaviour. Pre-governance is orthogonal to this: it does not require AI to be aligned, adversarial, or anything in between. It works at the infrastructure layer regardless of model disposition. This means pre-governance is compatible with any alignment outcome — it provides institutional protection whether alignment succeeds or fails. It is not a competing approach. It is the institutional substrate that alignment research currently lacks.

For the future of institutions

Every institution that uses AI is making an implicit choice about authority. Defaulting to the AI vendor's governance is itself a governance decision — it delegates institutional sovereignty to a commercial entity whose incentives may diverge from the institution's mission. Pre-governance makes this choice explicit and reversible. It treats AI governance as institutional design, not technical configuration.

Key Terms

Decision Surface

The set of actions available to an agent. Pre-governance constrains the surface; post-hoc filtering monitors actions taken on an unconstrained surface.

Pre-Governance

Designing the conditions for action before delegation occurs. Determines which options may exist, not which outcomes are acceptable.

Post-hoc Filtering

Allowing free action, then classifying, moderating, or rejecting outputs that violate policy. The dominant paradigm in current AI governance.

Institutional Memory

The accumulated record of decisions, precedents, and contestations that makes governance richer over time. Exists in explicit structures, not model weights.

Governance Trace

An auditable record of who decided, what was decided, under what authority, and when. The atomic unit of institutional accountability.

MCP (Model Context Protocol)

The protocol layer between AI capability and institutional action. Defines available tools, governance checks, and boundary conditions.

Continue Reading

Pre-Governing

The foundational concept — designing conditions for governance before delegation

Governing the Decision Surface

Why legitimacy is determined at the option level, not the outcome level

Authority Collapse in High-Speed Systems

What happens when systems decide faster than they can justify

Agentic Sovereignty: Case Study

670 APIs, 611 pages, 1 human — the empirical evidence

Stay updated

Get notified when we publish new research. No spam, unsubscribe anytime.

All Explainers

Explainer

The AI Authority Problem

Why “who decides?” matters more than “how smart?”

SDGs:

1
The Wrong Question

The industry asks

“How do we make AI behave?”

This leads to: bigger models, more RLHF, more classifiers, more filters, more guardrails. An arms race between capability and constraint where capability always wins.

The real question

“Who decides what 'behave' means?”

This leads to: explicit authority structures, auditable governance, institutional sovereignty, and AI as a capability layer that serves — not defines — institutional intent.

2
Two Paradigms

	Post-hoc Filtering	Pre-Governance
Core operation	Generate freely, filter outputs	Bound decision surface, act within it
Where governance lives	Model weights (RLHF) + runtime classifiers	Explicit structures (constraints, traces, precedents)
Scales by	Bigger models, more RLHF, more classifiers	Clearer boundaries, delegated authority
Auditable?	No — weights are opaque, classifiers are proprietary	Yes — every constraint inspectable, every decision traced
Contestable?	No — users cannot challenge RLHF decisions	Yes — escalation, precedent, and forum contestation
Institutional memory	Resets between sessions	Compounds over time
Sovereignty	AI vendor decides acceptable behaviour	Institution defines its own constraints
AI vendor changes policy	Your governance changes without consent	Your governance is unaffected — AI is swappable

3
Why Post-hoc Filtering Fails at Scale

The Scaling Divergence

As AI capability grows, the output space expands combinatorially. Filters fall behind. Constrained surfaces remain stable.

The Combinatorial Ceiling

Institutional Memory Compounds

Sovereignty Is Non-Negotiable

4
“But What If AI Gets Smart Enough to Break Out?”

This is the most common objection, and it contains a category error that reveals the depth of the paradigm difference.

The deeper point

5
How It Works in Practice

1. Bounded Authority

Each AI agent operates within a defined decision surface. The protocol specifies what the agent can decide, not what it should output.

Example: A code-writing agent can modify files but cannot send communications or make financial commitments without governance checks.

2. Check at Moment of Action

Before any consequential action, the agent calls a governance check. This evaluates the action against institutional constraints, precedents, and delegated authorities.

Example: Before publishing content, the AI checks: does this fall within delegated publishing authority? Have similar actions been contested?

3. Record and Trace

After completing a consequential action, the agent records a governance trace — who decided, what, under what authority, when. Traces accumulate into institutional memory.

Example: Every governance decision creates an auditable record that future decisions can reference as precedent.

4. Escalation, Not Failure

When a governance check returns violations or requires approval, the agent escalates to the human operator. It does not fail silently or attempt workarounds.

Example: Constraint violation detected → human reviews → approves/denies → decision becomes precedent for future similar situations.

6
The Long Horizon

The paradigm difference becomes more pronounced over time, not less.

1-3 Years

Post-hoc

Works adequately for current AI capability. Jailbreaks are annoying but manageable. Most organisations don't notice the sovereignty issue.

Pre-governance

Requires upfront investment in governance architecture. Pays off in auditability and institutional memory accumulation.

5-10 Years

Post-hoc

Filter coverage gap becomes critical. AI capability outpaces filtering capacity. Vendor lock-in becomes apparent as switching costs are governance costs.

Pre-governance

Institutional memory is now substantial. Governance decisions build on years of precedent. AI vendors are genuinely interchangeable.

20-50 Years

Post-hoc

Institutional governance is entirely dependent on vendor decisions. Organisations have no sovereign governance capacity — it was never built.

Pre-governance

Institutional governance is mature, self-reinforcing, and independent of any specific AI system. Authority has compounded like capital.

The question that governs the next century of human-AI relations:

“Who decides what AI is allowed to do?”

7
So What

For organisations adopting AI

For AI policy and regulation

For the AI safety community

For the future of institutions

Key Terms

Decision Surface

The set of actions available to an agent. Pre-governance constrains the surface; post-hoc filtering monitors actions taken on an unconstrained surface.

Pre-Governance

Designing the conditions for action before delegation occurs. Determines which options may exist, not which outcomes are acceptable.

Post-hoc Filtering

Allowing free action, then classifying, moderating, or rejecting outputs that violate policy. The dominant paradigm in current AI governance.

Institutional Memory

The accumulated record of decisions, precedents, and contestations that makes governance richer over time. Exists in explicit structures, not model weights.

Governance Trace

An auditable record of who decided, what was decided, under what authority, and when. The atomic unit of institutional accountability.

MCP (Model Context Protocol)

The protocol layer between AI capability and institutional action. Defines available tools, governance checks, and boundary conditions.

Continue Reading

Pre-Governing

The foundational concept — designing conditions for governance before delegation

Governing the Decision Surface

Why legitimacy is determined at the option level, not the outcome level

Authority Collapse in High-Speed Systems

What happens when systems decide faster than they can justify

Agentic Sovereignty: Case Study

670 APIs, 611 pages, 1 human — the empirical evidence

Stay updated

The AI Authority Problem

1The Wrong Question

2Two Paradigms

3Why Post-hoc Filtering Fails at Scale

The Scaling Divergence

The Combinatorial Ceiling

Institutional Memory Compounds

Sovereignty Is Non-Negotiable

4“But What If AI Gets Smart Enough to Break Out?”

5How It Works in Practice

1. Bounded Authority

2. Check at Moment of Action

3. Record and Trace

4. Escalation, Not Failure

6The Long Horizon

1-3 Years

5-10 Years

20-50 Years

7So What

For organisations adopting AI

For AI policy and regulation

For the AI safety community

For the future of institutions

Key Terms

Continue Reading

Pre-Governing

Governing the Decision Surface

Authority Collapse in High-Speed Systems

Agentic Sovereignty: Case Study

Stay updated

The AI Authority Problem

1The Wrong Question

2Two Paradigms

3Why Post-hoc Filtering Fails at Scale

The Scaling Divergence

The Combinatorial Ceiling

Institutional Memory Compounds

Sovereignty Is Non-Negotiable

4“But What If AI Gets Smart Enough to Break Out?”

5How It Works in Practice

1. Bounded Authority

2. Check at Moment of Action

3. Record and Trace

4. Escalation, Not Failure

6The Long Horizon

1-3 Years

5-10 Years

20-50 Years

7So What

For organisations adopting AI

For AI policy and regulation

For the AI safety community

For the future of institutions

Key Terms

Continue Reading

Pre-Governing

Governing the Decision Surface

Authority Collapse in High-Speed Systems

Agentic Sovereignty: Case Study

1
The Wrong Question

2
Two Paradigms

3
Why Post-hoc Filtering Fails at Scale

4
“But What If AI Gets Smart Enough to Break Out?”

5
How It Works in Practice

6
The Long Horizon

7
So What

1
The Wrong Question

2
Two Paradigms

3
Why Post-hoc Filtering Fails at Scale

4
“But What If AI Gets Smart Enough to Break Out?”

5
How It Works in Practice

6
The Long Horizon

7
So What