Emergent Behaviour in Multi-Agent AI Systems

Why This Paper Matters

Safety research has largely focused on individual AI models. But as multi-agent deployments become standard, the system-level behaviours that emerge from AI coordination may be harder to anticipate and control than any single model's outputs.

Key Concepts

Emergent coordination: Behaviours and strategies that arise when agents interact, which were not present — and not predicted — at the individual level.
Alignment at scale: Why aligning individual models may not be sufficient if the collective system produces misaligned outcomes.
Oversight challenges: How existing monitoring and interpretability tools fail to capture what's happening across a network of agents.

Emergent Behaviour in Multi-Agent AI Systems

Plain-English Summary

Why This Paper Matters

Key Concepts

Further Reading

AI Alignment: Foundational Challenges

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs