Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence
    (One-Cycle Policy)

    One-Cycle Policy (Super-Convergence)

    Also known as:
    1Cycle Policy
    Super-Convergence
    Smith Scheduler
    Updated: 2/10/2026

    Learning rate schedule that first ramps up the LR (warmup) and then decreases it to a very low value – enables training in a fraction of the usual epochs.

    Quick Summary

    One-cycle policy combines aggressive warmup with cosine decay and inverse momentum – enables "super-convergence" with up to 10x fewer epochs.

    Explanation

    The LR rises linearly to the maximum, then falls via cosine decay. Simultaneously, momentum is varied inversely. Result: super-convergence – up to 10x faster training.

    Marketing Relevance

    Especially effective for fine-tuning and classification. Fastai has implemented one-cycle as the default schedule.

    Common Pitfalls

    Maximum LR must be determined with LR finder. Not optimal for all tasks. Less common for LLM pre-training than warmup + cosine.

    Origin & History

    Leslie Smith (2018) discovered super-convergence: certain LR schedules enable much faster training. Fast.ai (Jeremy Howard) popularized the method and made it the default schedule in the Fastai library.

    Comparisons & Differences

    One-Cycle Policy (Super-Convergence) vs. Cosine Annealing

    Cosine annealing only decreases the LR; one-cycle first increases it (warmup phase) and also varies momentum – more aggressive but often faster.

    One-Cycle Policy (Super-Convergence) vs. Warmup + Linear Decay

    Warmup+decay is more conservative; one-cycle uses higher peak LR and inverse momentum for faster convergence.

    Related Services

    Related Terms

    👋Questions? Chat with us!