Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Sharpness-Aware Minimization (SAM)

    Also known as:
    SAM
    Sharpness-Aware Optimizer
    ASAM
    Updated: 2/12/2026

    Optimization method that minimizes not only the loss but also the "sharpness" of the loss landscape – finds flatter minima for better generalization.

    Quick Summary

    SAM specifically seeks flat minima through adversarial perturbation – better generalization at the cost of 2x compute per step.

    Explanation

    SAM performs two forward passes per step: first an adversarial step toward maximum loss increase, then optimization at that point. Result: parameters land in flat, robust regions.

    Marketing Relevance

    SAM significantly improves generalization in vision models. Google uses SAM for ViT training. Especially effective with limited data.

    Common Pitfalls

    2x compute cost from double forward pass. ASAM (Adaptive SAM) reduces overhead. Not always worthwhile for LLM training.

    Origin & History

    Foret et al. (Google, 2021) published SAM showing consistent improvements across diverse benchmarks. ASAM (Kwon et al., 2021) made SAM adaptive. SAM became standard for Google's ViT training.

    Comparisons & Differences

    Sharpness-Aware Minimization (SAM) vs. AdamW

    AdamW minimizes only loss; SAM minimizes loss AND landscape sharpness. SAM can be layered on AdamW (SAM + AdamW).

    Sharpness-Aware Minimization (SAM) vs. Stochastic Weight Averaging (SWA)

    SWA averages checkpoints for flatter solutions post-hoc; SAM actively seeks flat minima during training.

    Related Services

    Related Terms

    👋Questions? Chat with us!