What is Cyclical Learning Rate (CLR)?

Learning rate schedule that cyclically varies the LR between a minimum and maximum – prevents stagnation and helps overcome saddle points. The LR rises and falls in triangular, trapezoidal, or cosine cycles. Periodically increasing the LR can "push" the model out of local minima and find better regions.

What is the difference between Cyclical Learning Rate (CLR) and Learning Rate Schedule?

Cyclical Learning Rate (CLR) and Learning Rate Schedule are related concepts in AI and marketing. Learning rate schedule that cyclically varies the LR between a minimum and maximum – prevents stagna...

Artificial Intelligence

Cyclical Learning Rate (CLR)

Also known as:

CLR

Cyclical LR

Triangular LR Schedule

Updated: 2/12/2026

Learning rate schedule that cyclically varies the LR between a minimum and maximum – prevents stagnation and helps overcome saddle points.

Quick Summary

Cyclical learning rates vary the LR periodically between min and max – prevents stagnation and was the predecessor of the one-cycle policy.

Explanation

The LR rises and falls in triangular, trapezoidal, or cosine cycles. Periodically increasing the LR can "push" the model out of local minima and find better regions.

Marketing Relevance

CLR was the predecessor of the one-cycle policy. Combined with the LR finder, a very effective tuning strategy.

Common Pitfalls

Cycle length and LR range must be determined with LR finder. Less common for LLM pre-training than warmup+cosine decay.

Origin & History

Leslie Smith (2017) introduced CLR in "Cyclical Learning Rates for Training Neural Networks." The method showed that periodically increasing LR helps find better solutions. Smith developed the one-cycle policy and LR finder from this.

Comparisons & Differences

Cyclical Learning Rate (CLR) vs. One-Cycle Policy

CLR has multiple cycles; one-cycle uses exactly one cycle for the entire training – more aggressive and often more effective.

Cyclical Learning Rate (CLR) vs. Cosine Annealing mit Warm Restarts

CLR uses linear triangular cycles; SGDR uses cosine cycles with optional restart. Similar principle, different curve shape.

Further Resources

Related Services

Strategy & Intelligence Tech & Integration Consulting

View all terms