Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Progressive Shrinking

    Also known as:
    Gradual Compression
    Progressive Compression
    Iterative Shrinking
    Updated: 2/11/2026

    A training technique that progressively shrinks a large network – first kernel, then depth, then width – to train a supernet supporting many subnetworks.

    Quick Summary

    Progressive Shrinking gradually reduces networks in kernel, depth, and width – the key technique enabling Once-for-All supernets.

    Explanation

    Progressive shrinking first trains the full model, then progressively co-trains smaller variants: Phase 1 (Elastic Kernel), Phase 2 (Elastic Depth), Phase 3 (Elastic Width). Each phase uses knowledge distillation from the full model.

    Marketing Relevance

    Central technique behind Once-for-All networks – enables training supernets that dynamically adapt to hardware constraints.

    Example

    In OFA, an ImageNet model is progressively shrunk: First smaller kernels (7→5→3) are trained, then layer drops, finally channel reductions. The result: one model, many deployment options.

    Common Pitfalls

    Complex multi-phase training pipeline. Order of shrinking dimensions matters. Requires careful hyperparameter tuning per phase.

    Origin & History

    Introduced by Cai et al. (2020) as the core method of the Once-for-All framework. Inspired by curriculum learning and gradual pruning (Zhu & Gupta, 2017).

    Comparisons & Differences

    Progressive Shrinking vs. One-Shot NAS

    One-Shot NAS trains all subnets simultaneously; Progressive Shrinking introduces them gradually for more stable training.

    Related Services

    Related Terms

    👋Questions? Chat with us!