Lookahead Optimizer
Meta-optimizer that maintains two sets of weights: "fast" weights (normal optimizer) and "slow" weights that are periodically interpolated toward the fast ones.
Lookahead maintains fast and slow weights – stabilizes training through periodic interpolation, can be layered on any optimizer.
Explanation
Every k steps: slow_weights = slow_weights + α × (fast_weights − slow_weights). The slow weights act as a stabilizing anchor. Ranger = Lookahead + RAdam.
Marketing Relevance
Lookahead can be layered on any optimizer and reduces variance without additional hyperparameter search.
Common Pitfalls
Additional memory for slow weights. Synchronization interval k must be chosen. Not always better than well-tuned AdamW.
Origin & History
Zhang et al. (2019, University of Toronto) proposed Lookahead. The combination "Ranger" (Lookahead + RAdam, Less Wright 2019) became popular in the Fast.ai community.
Comparisons & Differences
Lookahead Optimizer vs. EMA
EMA averages weights continuously for inference; Lookahead interpolates periodically for training stability – both maintain "smoothed" weights.