Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Gradient Clipping

    Also known as:
    Gradient Clipping
    Gradient Norm Clipping
    Grad Clip
    Updated: 2/9/2026

    Gradient clipping limits the norm or value of gradients during training to prevent exploding gradients.

    Quick Summary

    Gradient clipping limits gradient norms and prevents exploding gradients – standard technique for stable LLM and transformer training.

    Explanation

    When the gradient norm exceeds a threshold, all gradients are proportionally scaled. Standard in LLM training (typical: max_norm=1.0). Two variants: clip by value and clip by norm.

    Marketing Relevance

    Essential for stable training of RNNs, transformers, and LLMs – without gradient clipping, training often diverges.

    Origin & History

    Pascanu et al. (2013) formalized gradient clipping for RNNs. With the rise of transformers and LLMs, gradient clipping (max_norm=1.0) became standard in all large training runs (GPT, LLaMA, etc.).

    Comparisons & Differences

    Gradient Clipping vs. Vanishing Gradient

    Gradient clipping solves exploding gradients (too large); vanishing gradients (too small) need other solutions (skip connections, normalization).

    Related Services

    Related Terms

    Exploding GradientVanishing GradientTraining StabilityOptimizerLLM Training
    👋Questions? Chat with us!