Superalignment
The research problem of how to make AI systems smarter than humans (superintelligence) safe and controllable.
Superalignment = How do you control AI smarter than all of humanity? OpenAI's biggest research problem – unsolved but critical if AGI arrives.
Explanation
OpenAI founded a Superalignment team (2023, dissolved 2024). Core idea: Use weaker AI systems to supervise stronger ones. Closely related to Scalable Oversight and Interpretability.
Marketing Relevance
If AGI/ASI is achieved, superalignment is the most important technical challenge. Determines whether superintelligent AI follows human values.
Common Pitfalls
Potentially unsolvable problem. No consensus on approaches. Time pressure vs. thoroughness. Could create a sense of safety that is not justified.
Origin & History
Ilya Sutskever (OpenAI) founded the Superalignment team in July 2023 with 20% of compute. The team dissolved in 2024 (Sutskever and Leike left OpenAI). The problem remains one of the biggest challenges in AI research.
Comparisons & Differences
Superalignment vs. Alignment
Alignment optimizes current models; Superalignment addresses future superintelligent systems that are qualitatively different.
Superalignment vs. AI Safety
AI Safety is the broad field; Superalignment specifically focuses on the control problem with superintelligence.