Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Thompson Sampling

    Also known as:
    Bayesian Bandit
    Posterior Sampling
    Probability Matching
    Updated: 2/11/2026

    Bayesian bandit algorithm that selects actions proportionally to the probability that they are optimal.

    Quick Summary

    Thompson Sampling selects options proportionally to the probability they are optimal – the most elegant bandit algorithm, known since 1933 but only popular since 2010.

    Explanation

    Thompson Sampling maintains a posterior distribution over each option's reward. Each round, it samples from each posterior and selects the option with the highest sample. Naturally balances exploration (uncertain options) and exploitation (known good options).

    Marketing Relevance

    Optimal for marketing optimization: Ad creative selection, headline testing, recommendation ranking – more efficient than A/B tests with many variants.

    Common Pitfalls

    Prior choice influences results. Delayed rewards (e.g., conversions days later) require special adjustments. Non-stationary environments need decay.

    Origin & History

    William R. Thompson published the algorithm in 1933 – one of the earliest ML algorithms ever. Chapelle & Li (2011) demonstrated its efficiency for online advertising. Today standard at Google, Netflix, and Spotify for personalization.

    Comparisons & Differences

    Thompson Sampling vs. UCB (Upper Confidence Bound)

    UCB deterministically selects the option with highest upper confidence bound; Thompson Sampling is stochastic (samples from posteriors).

    Thompson Sampling vs. Epsilon-Greedy

    Epsilon-Greedy explores randomly at fixed rate ε; Thompson Sampling explores intelligently proportional to uncertainty.

    Related Services

    Related Terms

    👋Questions? Chat with us!