Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Gradient Noise

    Also known as:
    Stochastic Gradient Noise
    Mini-Batch Noise
    SGD Noise
    Updated: 2/10/2026

    The natural noise in gradient estimates from mini-batch sampling – acts as implicit regularization and helps find better minima.

    Quick Summary

    Gradient noise from mini-batch sampling is not a bug but a feature: it acts as natural regularization and helps SGD find flatter, better minima.

    Explanation

    Each mini-batch provides a noisy estimate of the true gradient. This noise helps "escape" sharp minima and find flatter, better-generalizing solutions.

    Marketing Relevance

    Gradient noise explains why smaller batch sizes often generalize better and why SGD finds flatter minima than full-batch GD.

    Common Pitfalls

    Too much noise (too small batches) prevents convergence. Too little noise (too large batches) can worsen generalization.

    Origin & History

    The regularizing effect of SGD noise was intensively researched from 2015. Keskar et al. (2017) showed that large batches lead to sharp minima. Smith & Le (2018) formalized SGD noise as Bayesian inference.

    Comparisons & Differences

    Gradient Noise vs. Dropout

    Dropout adds explicit noise to activations (regularization by design); gradient noise arises naturally through mini-batch sampling.

    Gradient Noise vs. Gradient Clipping

    Gradient clipping limits gradient magnitude (against exploding); gradient noise describes natural variance (feature, not problem).

    Related Services

    Related Terms

    👋Questions? Chat with us!