Once-for-All (OFA)
A training method that trains a single "supernet" from which many specialized subnetworks can be extracted for different hardware constraints – train once, deploy everywhere.
Once-for-All trains a supernet from which specialized models for any hardware can be extracted without retraining – train once, deploy everywhere.
Explanation
OFA trains a large network with progressive shrinking: first full size, then depth, width, and kernel size are gradually reduced. The resulting supernet contains billions of possible subnets that can be selected for specific hardware budgets without retraining.
Marketing Relevance
OFA solves the "one model per device" problem: Instead of training 100 models for 100 devices, train once and extract specialized versions for smartphone, tablet, edge server, cloud.
Example
MIT HAN Lab trained an OFA network from which models for any latency budget can be extracted in seconds – from Raspberry Pi (20ms) to server GPU (5ms), all from the same supernet.
Common Pitfalls
Supernet training is very expensive (GPU-days). Subnets are not always optimal – specialized training can be better. Complex training pipeline.
Origin & History
Cai et al. (MIT HAN Lab, 2020) published the OFA paper "Once-for-All: Train One Network and Specialize it for Efficient Deployment." It won multiple NAS competitions and inspired elastic training approaches.
Comparisons & Differences
Once-for-All (OFA) vs. Neural Architecture Search
NAS trains and evaluates many candidates individually; OFA trains one supernet and extracts candidates without retraining.
Once-for-All (OFA) vs. Knowledge Distillation
Distillation trains a small model from a large one; OFA contains many small models within a large one.