Thompson Sampling
Bayesian bandit algorithm that selects actions proportionally to the probability that they are optimal.
Thompson Sampling selects options proportionally to the probability they are optimal – the most elegant bandit algorithm, known since 1933 but only popular since 2010.
Explanation
Thompson Sampling maintains a posterior distribution over each option's reward. Each round, it samples from each posterior and selects the option with the highest sample. Naturally balances exploration (uncertain options) and exploitation (known good options).
Marketing Relevance
Optimal for marketing optimization: Ad creative selection, headline testing, recommendation ranking – more efficient than A/B tests with many variants.
Common Pitfalls
Prior choice influences results. Delayed rewards (e.g., conversions days later) require special adjustments. Non-stationary environments need decay.
Origin & History
William R. Thompson published the algorithm in 1933 – one of the earliest ML algorithms ever. Chapelle & Li (2011) demonstrated its efficiency for online advertising. Today standard at Google, Netflix, and Spotify for personalization.
Comparisons & Differences
Thompson Sampling vs. UCB (Upper Confidence Bound)
UCB deterministically selects the option with highest upper confidence bound; Thompson Sampling is stochastic (samples from posteriors).
Thompson Sampling vs. Epsilon-Greedy
Epsilon-Greedy explores randomly at fixed rate ε; Thompson Sampling explores intelligently proportional to uncertainty.