One-Cycle Policy (Super-Convergence)
Learning rate schedule that first ramps up the LR (warmup) and then decreases it to a very low value – enables training in a fraction of the usual epochs.
One-cycle policy combines aggressive warmup with cosine decay and inverse momentum – enables "super-convergence" with up to 10x fewer epochs.
Explanation
The LR rises linearly to the maximum, then falls via cosine decay. Simultaneously, momentum is varied inversely. Result: super-convergence – up to 10x faster training.
Marketing Relevance
Especially effective for fine-tuning and classification. Fastai has implemented one-cycle as the default schedule.
Common Pitfalls
Maximum LR must be determined with LR finder. Not optimal for all tasks. Less common for LLM pre-training than warmup + cosine.
Origin & History
Leslie Smith (2018) discovered super-convergence: certain LR schedules enable much faster training. Fast.ai (Jeremy Howard) popularized the method and made it the default schedule in the Fastai library.
Comparisons & Differences
One-Cycle Policy (Super-Convergence) vs. Cosine Annealing
Cosine annealing only decreases the LR; one-cycle first increases it (warmup phase) and also varies momentum – more aggressive but often faster.
One-Cycle Policy (Super-Convergence) vs. Warmup + Linear Decay
Warmup+decay is more conservative; one-cycle uses higher peak LR and inverse momentum for faster convergence.