Question 1

What is FSDP (Fully Sharded Data Parallel)?

Accepted Answer

PyTorch's native implementation of parameter sharding – distributes model parameters, gradients, and optimizer states across GPUs for memory-efficient training. FSDP shards all model parameters: Each GPU holds only 1/N. Before each forward/backward, needed parameters are gathered via AllGather, released after computation. Conceptually identical to DeepSpeed ZeRO-3, but native to PyTorch.

Question 2

How does FSDP (Fully Sharded Data Parallel) work?

Accepted Answer

FSDP shards all model parameters: Each GPU holds only 1/N. Before each forward/backward, needed parameters are gathered via AllGather, released after computation. Conceptually identical to DeepSpeed ZeRO-3, but native to PyTorch.

Question 3

Why is FSDP (Fully Sharded Data Parallel) important for marketing?

Accepted Answer

FSDP is the new standard for LLM training in PyTorch – replaces DDP for large models and provides memory efficiency without external libraries.

Question 4

How is FSDP (Fully Sharded Data Parallel) used in practice?

Accepted Answer

Llama-2 training uses FSDP: A 70B model is sharded across 512 GPUs. Each GPU holds only ~280MB parameters instead of 140GB. Training scales nearly linearly.

Question 5

What are common mistakes with FSDP (Fully Sharded Data Parallel)?

Accepted Answer

Configuration complex (sharding strategy, mixed precision, CPU offloading). Debugging harder than DDP. Not all custom layers are FSDP-compatible. Communication overhead for small models.

Question 6

Where does FSDP (Fully Sharded Data Parallel) come from?

Accepted Answer

FairScale (Meta, 2021) brought the first FSDP implementation. PyTorch integrated FSDP natively in v1.12 (2022). FSDP2 (2024) simplified the API and improved performance. Meta uses FSDP for all Llama training runs.

FSDP (Fully Sharded Data Parallel)

Explanation

Marketing Relevance

Example

Common Pitfalls

Origin & History

Comparisons & Differences

FSDP (Fully Sharded Data Parallel) vs. DeepSpeed ZeRO

Further Resources

Related Services

Related Terms