Bandit-Based Recommendation
Recommendation systems using multi-armed bandits to balance exploration of new items with exploitation of known preferences.
Bandit-based recommendations learn online and balance exploration of new items with exploitation of proven ones – ideal for fast feedback loops.
Explanation
Contextual bandits use user context as features and learn online which items are optimal for which user contexts. No batch retraining needed – continuous learning.
Marketing Relevance
Ideal for marketing personalization: website banners, email subject lines, product recommendations – anything with fast feedback loops.
Example
A news feed uses LinUCB to find the optimal mix of known and new articles for each user context.
Common Pitfalls
Delayed rewards (e.g., conversions after days) are hard to handle. Reward signal design is crucial.
Origin & History
Li et al. (2010) introduced LinUCB for personalized news recommendations. Yahoo and Microsoft early adopted bandits for ad selection. Contextual bandits have been standard for online personalization since 2020.
Comparisons & Differences
Bandit-Based Recommendation vs. A/B Testing
A/B testing statically tests few variants; bandits continuously optimize across many options.