What is One-Cycle Policy (Super-Convergence)?

Learning rate schedule that first ramps up the LR (warmup) and then decreases it to a very low value – enables training in a fraction of the usual epochs. The LR rises linearly to the maximum, then falls via cosine decay. Simultaneously, momentum is varied inversely. Result: super-convergence – up to 10x faster training.

What is the difference between One-Cycle Policy (Super-Convergence) and Learning Rate Schedule?

One-Cycle Policy (Super-Convergence) and Learning Rate Schedule are related concepts in AI and marketing. Learning rate schedule that first ramps up the LR (warmup) and then decreases it to a very low value...

Artificial Intelligence

(One-Cycle Policy)

One-Cycle Policy (Super-Convergence)

Also known as:

1Cycle Policy

Super-Convergence

Smith Scheduler

Updated: 2/10/2026

Learning rate schedule that first ramps up the LR (warmup) and then decreases it to a very low value – enables training in a fraction of the usual epochs.

Quick Summary

One-cycle policy combines aggressive warmup with cosine decay and inverse momentum – enables "super-convergence" with up to 10x fewer epochs.

Explanation

The LR rises linearly to the maximum, then falls via cosine decay. Simultaneously, momentum is varied inversely. Result: super-convergence – up to 10x faster training.

Marketing Relevance

Especially effective for fine-tuning and classification. Fastai has implemented one-cycle as the default schedule.

Common Pitfalls

Maximum LR must be determined with LR finder. Not optimal for all tasks. Less common for LLM pre-training than warmup + cosine.

Origin & History

Leslie Smith (2018) discovered super-convergence: certain LR schedules enable much faster training. Fast.ai (Jeremy Howard) popularized the method and made it the default schedule in the Fastai library.

Comparisons & Differences

One-Cycle Policy (Super-Convergence) vs. Cosine Annealing

Cosine annealing only decreases the LR; one-cycle first increases it (warmup phase) and also varies momentum – more aggressive but often faster.

One-Cycle Policy (Super-Convergence) vs. Warmup + Linear Decay

Warmup+decay is more conservative; one-cycle uses higher peak LR and inverse momentum for faster convergence.

Further Resources

Related Services

Strategy & Intelligence Tech & Integration Consulting

View all terms