Loading...
Loading...
Alignment is technical: getting AI to pursue specified goals. Governance is institutional: determining who specifies goals. Both are necessary—but governance is under-theorised.
Technical problem: How do we make AI systems reliably pursue the goals we specify? RLHF, reward modeling, interpretability, Constitutional AI.
Assumes: We know what goals to specify—the question is achieving them.
Institutional problem: Who has authority to determine what AI should do? Legitimacy, representation, accountability, authority.
Asks: Who decides—and on what basis is their authority legitimate?
| Dimension | Alignment | Governance |
|---|---|---|
| Definition | Getting AI to reliably pursue specified goals | Determining who specifies goals and with what authority |
| Primary question | How do we make AI do what we want? | Who decides what AI should want? |
| Discipline | Technical (ML, interpretability, reward modeling) | Institutional (law, policy, political theory) |
| Assumes | Goals are given—we need AI to pursue them | Goals must be specified—we need authority to do so |
| Success looks like | AI reliably pursues specified objectives | Clear, legitimate authority over AI purpose |
| Failure looks like | AI pursues wrong objectives or gaming metrics | No one authorized what AI is doing |
AI perfectly pursues goals that no one legitimately authorized. Whose values? Why these values? No answer.
Clear authority exists but AI doesn't reliably pursue authorized goals. Technical failure undermines institutional design.
AI pursues emergent goals that nobody authorized or intended—the current default.
Both together: Legitimate authority specifies goals, and AI reliably pursues them. This is the target state—requiring advances in both alignment research and governance theory.
IRSA's research aims to address this imbalance by developing governance theory as rigorously as alignment research—treating authority over AI purpose as a fundamental question requiring its own theoretical framework.
AI alignment is the technical problem of getting AI systems to reliably pursue the goals we specify. It includes techniques like RLHF, Constitutional AI, reward modeling, and interpretability research. Alignment assumes we know what goals to specify—the question is making AI pursue them.
AI governance is the institutional problem of determining who has authority to specify what AI should do. It includes questions of democratic legitimacy, stakeholder representation, accountability structures, and authority distribution. Governance asks who decides—the question alignment assumes is answered.
Alignment without governance means AI perfectly pursues goals no one legitimately authorized. Governance without alignment means legitimate authority can't translate into reliable AI behavior. Neither alone solves the problem of AI serving human values.
Alignment is more developed—substantial technical research, major labs dedicating resources, clear problem formulation. Governance is under-theorised—fragmented across law, policy, and political theory, with no clear framework for determining who should decide what AI values.