Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Cosine Annealing

    Also known as:
    Cosine Decay
    Cosine Schedule
    SGDR
    Updated: 2/10/2026

    A learning rate schedule strategy that gently reduces the learning rate from a maximum value to near zero following a cosine curve.

    Quick Summary

    Cosine annealing lowers the learning rate in a cosine curve – standard schedule for LLM training and vision models, gentler than step decay.

    Explanation

    Cosine annealing reduces LR more gently than step decay and enables late fine-tuning with very small rates. Warm restarts periodically reset the LR.

    Marketing Relevance

    Cosine annealing is the de facto standard for LLM pre-training and vision models. Almost all modern training recipes use it.

    Common Pitfalls

    Total steps must be known in advance. Warm restarts require tuning of cycle length. Not always better than linear decay.

    Origin & History

    Loshchilov & Hutter (2017) introduced SGDR (SGD with Warm Restarts), combining cosine annealing with periodic restarts. The Chinchilla paper (2022) used cosine decay for optimal LLM training. Standard since then.

    Comparisons & Differences

    Cosine Annealing vs. Step Decay

    Step decay reduces LR abruptly at fixed intervals; cosine annealing lowers it smoothly and continuously.

    Cosine Annealing vs. Linear Decay

    Linear decay lowers LR uniformly; cosine annealing decreases slower initially, then faster – maintains a higher LR longer.

    Related Services

    Related Terms

    👋Questions? Chat with us!