Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Cyclical Learning Rate (CLR)

    Also known as:
    CLR
    Cyclical LR
    Triangular LR Schedule
    Updated: 2/12/2026

    Learning rate schedule that cyclically varies the LR between a minimum and maximum – prevents stagnation and helps overcome saddle points.

    Quick Summary

    Cyclical learning rates vary the LR periodically between min and max – prevents stagnation and was the predecessor of the one-cycle policy.

    Explanation

    The LR rises and falls in triangular, trapezoidal, or cosine cycles. Periodically increasing the LR can "push" the model out of local minima and find better regions.

    Marketing Relevance

    CLR was the predecessor of the one-cycle policy. Combined with the LR finder, a very effective tuning strategy.

    Common Pitfalls

    Cycle length and LR range must be determined with LR finder. Less common for LLM pre-training than warmup+cosine decay.

    Origin & History

    Leslie Smith (2017) introduced CLR in "Cyclical Learning Rates for Training Neural Networks." The method showed that periodically increasing LR helps find better solutions. Smith developed the one-cycle policy and LR finder from this.

    Comparisons & Differences

    Cyclical Learning Rate (CLR) vs. One-Cycle Policy

    CLR has multiple cycles; one-cycle uses exactly one cycle for the entire training – more aggressive and often more effective.

    Cyclical Learning Rate (CLR) vs. Cosine Annealing mit Warm Restarts

    CLR uses linear triangular cycles; SGDR uses cosine cycles with optional restart. Similar principle, different curve shape.

    Related Services

    Related Terms

    👋Questions? Chat with us!