Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    ZeRO (Zero Redundancy Optimizer)

    Also known as:
    ZeRO Optimizer
    Zero Redundancy Optimizer
    DeepSpeed ZeRO
    ZeRO-1/2/3
    Updated: 2/11/2026

    A memory optimization for distributed training that shards optimizer states, gradients, and parameters across GPUs instead of replicating – enables training of trillion-parameter models.

    Quick Summary

    ZeRO shards optimizer states, gradients, and parameters across GPUs – eliminates redundancy and enables training of models that otherwise wouldn't fit in GPU memory.

    Explanation

    ZeRO has 3 stages: ZeRO-1 (shard optimizer states, 4x memory reduction), ZeRO-2 (+gradients, 8x), ZeRO-3 (+parameters, linearly scalable). ZeRO-Infinity extends this to CPU/NVMe. Each GPU holds only 1/N of the data.

    Marketing Relevance

    ZeRO revolutionized LLM training: Without ZeRO, training 100B+ models on standard GPU clusters would be impossible. Basis of DeepSpeed and PyTorch FSDP.

    Example

    Training a 13B model: Without ZeRO, each GPU needs ~52GB (model + optimizer). With ZeRO-3 on 8 GPUs, each needs only ~7GB – 8x more efficient.

    Common Pitfalls

    ZeRO-3 has higher communication overhead than ZeRO-1/2. ZeRO-Infinity is slow (CPU/NVMe). Configuration not trivial (stage choice, offloading options).

    Origin & History

    Rajbhandari et al. (Microsoft, 2020) published ZeRO as part of DeepSpeed. ZeRO-Infinity (2021) extended to CPU/NVMe offloading. PyTorch FSDP (2022) implemented ZeRO-3-like functionality natively. Today ZeRO is standard for every LLM training.

    Comparisons & Differences

    ZeRO (Zero Redundancy Optimizer) vs. FSDP

    ZeRO is DeepSpeed's implementation; FSDP is PyTorch's native implementation of the same concept (parameter sharding).

    ZeRO (Zero Redundancy Optimizer) vs. Data Parallelism (DDP)

    DDP replicates everything on each GPU; ZeRO shards and gathers on demand – dramatically less memory.

    Related Services

    Related Terms

    👋Questions? Chat with us!