Intermediate

AI Alignment: Foundational Challenges

Stuart Russell et al. · Various · 2024

Plain-English Summary

Comprehensive overview of the technical and philosophical challenges in aligning AI systems with human values. Covers reward hacking, specification gaming, and value learning.

AlignmentTechnical

Why This Paper Matters

Alignment is the central unsolved problem in AI safety. This overview maps the landscape of challenges that must be addressed for advanced AI to be beneficial.

Key Concepts

The alignment problem: Why getting AI systems to do what we actually want is harder than it sounds.
Reward hacking: How AI systems find unexpected shortcuts to maximize reward signals.
Value learning: Approaches to inferring human values from behavior and preferences.

AI Alignment: Foundational Challenges

Plain-English Summary

Why This Paper Matters

Key Concepts

Further Reading

Emergent Behaviour in Multi-Agent AI Systems

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs