Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Momentum

    Also known as:
    Momentum Optimization
    Heavy Ball Method
    Updated: 2/10/2026

    Acceleration technique for gradient descent that accumulates past gradient directions to converge faster and escape local minima.

    Quick Summary

    Momentum accelerates SGD by accumulating past gradients – like a ball rolling downhill that overcomes small hills (local minima). Default value: 0.9.

    Explanation

    Momentum adds a weighted fraction of the previous update to the current one. Like a rolling ball: it accelerates in consistent directions and overcomes small hills.

    Marketing Relevance

    Momentum is a standard component of all modern optimizers (SGD+Momentum, Adam). Typical value: 0.9.

    Common Pitfalls

    Too high momentum value can overshoot the minimum. Consider interaction with learning rate.

    Origin & History

    Boris Polyak introduced the heavy-ball method in 1964. Nesterov momentum (1983) looks ahead and improves convergence. Momentum was integrated into Adam (2015) as the first moment.

    Comparisons & Differences

    Momentum vs. Nesterov Momentum

    Standard momentum computes gradient at current point; Nesterov computes at "look-ahead" point – better convergence.

    Momentum vs. Adam (Adaptive Moment)

    Momentum uses only the first moment (mean of gradients); Adam also uses the second moment (variance) for adaptive learning rates.

    Related Services

    Related Terms

    👋Questions? Chat with us!