Question 1

What is Progressive Shrinking?

Accepted Answer

A training technique that progressively shrinks a large network – first kernel, then depth, then width – to train a supernet supporting many subnetworks. Progressive shrinking first trains the full model, then progressively co-trains smaller variants: Phase 1 (Elastic Kernel), Phase 2 (Elastic Depth), Phase 3 (Elastic Width). Each phase uses knowledge distillation from the full model.

Question 2

How does Progressive Shrinking work?

Accepted Answer

Progressive shrinking first trains the full model, then progressively co-trains smaller variants: Phase 1 (Elastic Kernel), Phase 2 (Elastic Depth), Phase 3 (Elastic Width). Each phase uses knowledge distillation from the full model.

Question 3

Why is Progressive Shrinking important for marketing?

Accepted Answer

Central technique behind Once-for-All networks – enables training supernets that dynamically adapt to hardware constraints.

Question 4

How is Progressive Shrinking used in practice?

Accepted Answer

In OFA, an ImageNet model is progressively shrunk: First smaller kernels (7→5→3) are trained, then layer drops, finally channel reductions. The result: one model, many deployment options.

Question 5

What are common mistakes with Progressive Shrinking?

Accepted Answer

Complex multi-phase training pipeline. Order of shrinking dimensions matters. Requires careful hyperparameter tuning per phase.

Question 6

Where does Progressive Shrinking come from?

Accepted Answer

Introduced by Cai et al. (2020) as the core method of the Once-for-All framework. Inspired by curriculum learning and gradual pruning (Zhu & Gupta, 2017).

Progressive Shrinking

Explanation

Marketing Relevance

Example

Common Pitfalls

Origin & History

Comparisons & Differences

Progressive Shrinking vs. One-Shot NAS

Further Resources

Related Services

Related Terms