Question 1

What is Sparse Training?

Accepted Answer

Training with sparsity from the start – instead of "train dense, then prune," the model stays sparse from the beginning with connections dynamically added/removed. Methods like RigL (Evci et al., 2020) and SET (Mocanu et al., 2018) maintain fixed sparsity during training but regularly swap connections: unimportant ones are removed, promising ones added. This saves FLOPs during training itself.

Question 2

How does Sparse Training work?

Accepted Answer

Methods like RigL (Evci et al., 2020) and SET (Mocanu et al., 2018) maintain fixed sparsity during training but regularly swap connections: unimportant ones are removed, promising ones added. This saves FLOPs during training itself.

Question 3

Why is Sparse Training important for marketing?

Accepted Answer

Sparse training promises efficiency not just at inference but also during training – potentially 10x cheaper LLM pre-training if hardware supports sparsity.

Question 4

How is Sparse Training used in practice?

Accepted Answer

RigL trains ResNet-50 at 90% sparsity and achieves 75% top-1 on ImageNet – same accuracy as dense training but with 5x fewer FLOPs during training.

Question 5

What are common mistakes with Sparse Training?

Accepted Answer

Current GPUs are poorly optimized for sparse training. Dynamic connection routing creates overhead. Still in early research for transformers/LLMs.

Question 6

Where does Sparse Training come from?

Accepted Answer

Mocanu et al. introduced SET (Sparse Evolutionary Training) in 2018. Evci et al. (Google, 2020) published RigL, matching dense training at 90% sparsity. NVIDIA researches hardware support with Ampere Sparse Tensor Cores.

Sparse Training

Explanation

Marketing Relevance

Example

Common Pitfalls

Origin & History

Comparisons & Differences

Sparse Training vs. Post-Training Pruning

Sparse Training vs. Lottery Ticket Hypothesis

Further Resources

Related Services

Related Terms