Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Self-Play

    Also known as:
    Self-Play Training
    Self-Competition
    Competitive Self-Play
    Updated: 2/10/2026

    Self-Play is an RL training method where an agent plays against copies of itself, continuously improving through competition.

    Quick Summary

    Self-Play trains AI against itself – the method behind AlphaGo/AlphaZero that achieves superhuman performance without human data.

    Explanation

    The agent generates its own training opponents that grow with it. This creates a natural curriculum from easy to hard and can lead to superhuman performance.

    Marketing Relevance

    Self-Play enabled AlphaGo/AlphaZero and is increasingly used for LLM training (debate, constitutional AI).

    Common Pitfalls

    Can get stuck in local optima (rock-paper-scissors cycles). Non-transitive strategies. High compute requirements.

    Origin & History

    Tesauro (1995, TD-Gammon) was an early success. AlphaGo (DeepMind, 2016) and AlphaZero (2017) demonstrated self-play in Go, chess, and Shogi. OpenAI Five (2019) for Dota 2.

    Comparisons & Differences

    Self-Play vs. Supervised Learning from Games

    Supervised learning needs human game records; Self-Play generates unlimited training data and exceeds human level.

    Related Services

    Related Terms

    👋Questions? Chat with us!