Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    RMSprop

    Also known as:
    Root Mean Square Propagation
    RMSProp Optimizer
    Updated: 2/10/2026

    Adaptive optimizer that solves AdaGrad's problem by using an exponentially weighted average of squared gradients instead of their sum.

    Quick Summary

    RMSprop fixed AdaGrad's monotonically decreasing learning rate through exponential forgetting of old gradients – predecessor of Adam and never formally published.

    Explanation

    RMSprop "forgets" old gradients and focuses on the current state. The learning rate doesn't monotonically decrease to zero and remains trainable. Hinton presented it in a Coursera lecture – never formally published.

    Marketing Relevance

    RMSprop was the most popular adaptive optimizer before Adam. Still relevant as a building block of Adam and for RL tasks.

    Common Pitfalls

    No momentum term (unlike Adam). Never formally published – only described in lecture slides. Replaced by AdamW for LLM training.

    Origin & History

    Geoffrey Hinton presented RMSprop in 2012 in his Coursera Neural Network Lectures – without formal publication. It still became the standard optimizer until Adam (2014) unified both ideas (adaptive LR + momentum).

    Comparisons & Differences

    RMSprop vs. AdaGrad

    AdaGrad accumulates without limit (LR → 0); RMSprop uses exponential average – maintains a usable learning rate.

    RMSprop vs. Adam

    RMSprop has only adaptive learning rates (2nd moment); Adam adds momentum (1st moment). Adam is "RMSprop + momentum".

    Related Services

    Related Terms

    👋Questions? Chat with us!