Self-Play
Self-Play is an RL training method where an agent plays against copies of itself, continuously improving through competition.
Self-Play trains AI against itself – the method behind AlphaGo/AlphaZero that achieves superhuman performance without human data.
Explanation
The agent generates its own training opponents that grow with it. This creates a natural curriculum from easy to hard and can lead to superhuman performance.
Marketing Relevance
Self-Play enabled AlphaGo/AlphaZero and is increasingly used for LLM training (debate, constitutional AI).
Common Pitfalls
Can get stuck in local optima (rock-paper-scissors cycles). Non-transitive strategies. High compute requirements.
Origin & History
Tesauro (1995, TD-Gammon) was an early success. AlphaGo (DeepMind, 2016) and AlphaZero (2017) demonstrated self-play in Go, chess, and Shogi. OpenAI Five (2019) for Dota 2.
Comparisons & Differences
Self-Play vs. Supervised Learning from Games
Supervised learning needs human game records; Self-Play generates unlimited training data and exceeds human level.