Group Normalization
Group Normalization divides channels into groups and normalizes within each group – works batch-independently and is ideal for small batch sizes.
Group Normalization normalizes channel groups instead of batches – the solution for small batch sizes in detection and segmentation.
Explanation
GN divides C channels into G groups (e.g., 32 groups). Normalization is over H×W×(C/G). Independent of batch size, therefore stable for detection/segmentation (often batch=1-2 due to large images).
Marketing Relevance
Standard normalization in object detection and segmentation, where small batches make batch norm unstable.
Common Pitfalls
Number of groups G as hyperparameter (32 is default). Not always better than BatchNorm with large batches.
Origin & History
Wu & He (Facebook AI, 2018) introduced Group Normalization. It became standard in Detectron2 and modern detection frameworks. MAE (2022) and other self-supervised methods also use GN.
Comparisons & Differences
Group Normalization vs. Batch Normalization
BatchNorm normalizes across the batch (unstable with small batches); GroupNorm across channel groups (batch-independent).
Group Normalization vs. Layer Normalization
LayerNorm normalizes all channels together; GroupNorm divides them into groups – middle ground between LayerNorm and InstanceNorm.