Loading...
Loading...
Why “who decides?” matters more than “how smart?”
The entire AI industry is trying to answer the wrong question. They ask how to make AI behave. The question that actually determines the next fifty years is who decides what “behave” means — and where that authority lives.
The industry asks
“How do we make AI behave?”
This leads to: bigger models, more RLHF, more classifiers, more filters, more guardrails. An arms race between capability and constraint where capability always wins.
The real question
“Who decides what 'behave' means?”
This leads to: explicit authority structures, auditable governance, institutional sovereignty, and AI as a capability layer that serves — not defines — institutional intent.
The distinction is not semantic. It determines who holds power in every human-AI relationship for the next century. The first question concentrates authority in AI vendors — whoever trains the model defines acceptable behaviour for everyone who uses it. The second question distributes authority to institutions — each organisation, community, or individual defines their own governance and uses AI as infrastructure.
The difference in one sentence: Post-hoc filtering makes AI vendors the unelected governors of every institution that uses their models. Pre-governance returns authority to the institutions themselves.
Post-hoc Filtering | ||
|---|---|---|
| Core operation | Generate freely, filter outputs | Bound decision surface, act within it |
| Where governance lives | Model weights (RLHF) + runtime classifiers | Explicit structures (constraints, traces, precedents) |
| Scales by | Bigger models, more RLHF, more classifiers | Clearer boundaries, delegated authority |
| Auditable? | No — weights are opaque, classifiers are proprietary | Yes — every constraint inspectable, every decision traced |
| Contestable? | No — users cannot challenge RLHF decisions | Yes — escalation, precedent, and forum contestation |
| Institutional memory | Resets between sessions | Compounds over time |
| Sovereignty | AI vendor decides acceptable behaviour | Institution defines its own constraints |
| AI vendor changes policy | Your governance changes without consent | Your governance is unaffected — AI is swappable |
Every row in this table represents an authority decision that the AI industry has made implicitly — by defaulting to post-hoc filtering without examining the alternative. The question is not which paradigm is technically superior. The question is which paradigm places authority where you want it to be.
As AI capability grows, the output space expands combinatorially. Filters fall behind. Constrained surfaces remain stable.
As AI capability increases, the space of possible outputs grows combinatorially. Every new capability multiplies the surface area that filters must cover. This is why every major AI lab experiences recurring jailbreak cycles — the output space expands faster than filtering capacity. Making models more capable makes this worse, not better.
Pre-governance response: Pre-governance inverts this: the constrained input space remains bounded regardless of model capability. A more powerful AI inside a well-governed boundary simply makes better decisions within that boundary. It does not escape it.
Post-hoc governance stores institutional knowledge in model weights (RLHF) and runtime filters. Both are opaque, non-auditable, and reset between sessions. There is no precedent, no accumulation, no ability to say 'we decided X in this context and here is why.'
Pre-governance response: Pre-governance stores decisions as explicit structures — constraints, commitments, traces, contestations. A five-year-old governance system has richer institutional memory than a one-day-old one. A five-year-old RLHF model simply has more weight updates that no one can inspect.
Post-hoc filtering means the AI vendor decides what 'acceptable' means. If the vendor changes their RLHF policy, your institutional governance changes without your consent. This is not theoretical — it has already happened repeatedly as AI vendors modify model behaviour in response to political, legal, and commercial pressures.
Pre-governance response: Pre-governance makes institutional governance portable and vendor-independent. The institution defines its own constraints. The AI is a capability layer that can be swapped without losing the governance layer. Authority remains where it belongs.
This is the most common objection, and it contains a category error that reveals the depth of the paradigm difference.
Post-hoc filtering is software trying to constrain software. Classifiers trying to catch outputs from a system that is smarter than the classifiers. This is vulnerable to capability scaling — a smarter model finds adversarial paths around dumber filters. The arms race is real.
Pre-governance is not software constraining software. It is architecture constraining the decision surface. The protocol does not evaluate whether an output is “good” — it defines what actions exist in the first place.
An AI cannot “break out” of not having a tool any more than a human employee can “break out” of not having admin credentials they were never given. There is nothing to circumvent — the capability simply is not in the environment.
The deeper point
The question “will AI break out?” assumes AI has motivation to break out. That is an anthropomorphism. Current AI systems do not have goals — they have completions. Pre-governance is agnostic to AI motivation. It does not need to assume AI is adversarial or aligned. It works at the infrastructure layer, not the behavioural layer. That is why it is more robust than alignment approaches that try to make AI “want” to be good.
The pre-governance paradigm is not theoretical. It operates through a governance protocol called Constellation, implemented as an MCP server. Every AI agent connecting to an institution is subject to its governance — automatically, at the moment of action.
Each AI agent operates within a defined decision surface. The protocol specifies what the agent can decide, not what it should output.
Example: A code-writing agent can modify files but cannot send communications or make financial commitments without governance checks.
Before any consequential action, the agent calls a governance check. This evaluates the action against institutional constraints, precedents, and delegated authorities.
Example: Before publishing content, the AI checks: does this fall within delegated publishing authority? Have similar actions been contested?
After completing a consequential action, the agent records a governance trace — who decided, what, under what authority, when. Traces accumulate into institutional memory.
Example: Every governance decision creates an auditable record that future decisions can reference as precedent.
When a governance check returns violations or requires approval, the agent escalates to the human operator. It does not fail silently or attempt workarounds.
Example: Constraint violation detected → human reviews → approves/denies → decision becomes precedent for future similar situations.
The human operator never reviews every line of AI-generated output. They review every consequential decision. Code is implementation. Governance is authority. The AI holds implementation authority. The human holds governance authority. At no point do these overlap.
The paradigm difference becomes more pronounced over time, not less.
Post-hoc
Works adequately for current AI capability. Jailbreaks are annoying but manageable. Most organisations don't notice the sovereignty issue.
Pre-governance
Requires upfront investment in governance architecture. Pays off in auditability and institutional memory accumulation.
Post-hoc
Filter coverage gap becomes critical. AI capability outpaces filtering capacity. Vendor lock-in becomes apparent as switching costs are governance costs.
Pre-governance
Institutional memory is now substantial. Governance decisions build on years of precedent. AI vendors are genuinely interchangeable.
Post-hoc
Institutional governance is entirely dependent on vendor decisions. Organisations have no sovereign governance capacity — it was never built.
Pre-governance
Institutional governance is mature, self-reinforcing, and independent of any specific AI system. Authority has compounded like capital.
The question that governs the next century of human-AI relations:
“Who decides what AI is allowed to do?”
Post-hoc filtering answers: the AI vendor. Pre-governance answers: the institution that the AI serves. Every organisation, every government, and every community will eventually have to choose. The architecture they build now determines which choice is still available to them later.
The question is not “which AI model should we use?” It is “what governance architecture will we wrap around AI action?” Model capability is commoditising. Governance architecture is the durable competitive advantage. Organisations that invest in explicit governance structures now will compound institutional memory that cannot be replicated by latecomers.
Current regulatory frameworks focus on model-level governance — classifying models by risk, requiring safety testing before deployment. This is necessary but insufficient. Pre-governance operates at the institutional level, between the model and the action. Regulation should incentivise institutional governance architectures, not just model compliance. The audit trail produced by pre-governance provides exactly the transparency that regulators need.
Alignment research focuses on making AI “want” to be good — encoding values into model behaviour. Pre-governance is orthogonal to this: it does not require AI to be aligned, adversarial, or anything in between. It works at the infrastructure layer regardless of model disposition. This means pre-governance is compatible with any alignment outcome — it provides institutional protection whether alignment succeeds or fails. It is not a competing approach. It is the institutional substrate that alignment research currently lacks.
Every institution that uses AI is making an implicit choice about authority. Defaulting to the AI vendor's governance is itself a governance decision — it delegates institutional sovereignty to a commercial entity whose incentives may diverge from the institution's mission. Pre-governance makes this choice explicit and reversible. It treats AI governance as institutional design, not technical configuration.
Decision Surface
The set of actions available to an agent. Pre-governance constrains the surface; post-hoc filtering monitors actions taken on an unconstrained surface.
Pre-Governance
Designing the conditions for action before delegation occurs. Determines which options may exist, not which outcomes are acceptable.
Post-hoc Filtering
Allowing free action, then classifying, moderating, or rejecting outputs that violate policy. The dominant paradigm in current AI governance.
Institutional Memory
The accumulated record of decisions, precedents, and contestations that makes governance richer over time. Exists in explicit structures, not model weights.
Governance Trace
An auditable record of who decided, what was decided, under what authority, and when. The atomic unit of institutional accountability.
MCP (Model Context Protocol)
The protocol layer between AI capability and institutional action. Defines available tools, governance checks, and boundary conditions.
The foundational concept — designing conditions for governance before delegation
Why legitimacy is determined at the option level, not the outcome level
What happens when systems decide faster than they can justify
670 APIs, 611 pages, 1 human — the empirical evidence