Weight Normalization
Weight Normalization reparameterizes weight vectors into direction and magnitude – an alternative to batch norm without batch dependency.
Weight Normalization separates weights into direction and magnitude – simpler than BatchNorm, no batch statistics needed.
Explanation
w = g · (v / ||v||), where g is magnitude and v is direction. Simpler than BatchNorm (no running statistics), applied directly to weights instead of activations.
Marketing Relevance
Useful where BatchNorm is not applicable (e.g., RNNs, generative models, reinforcement learning).
Origin & History
Salimans & Kingma (OpenAI, 2016) introduced Weight Normalization. It found use in WaveNet (2016) and some RL systems. Less common than BatchNorm/LayerNorm, but conceptually influential.
Comparisons & Differences
Weight Normalization vs. Batch Normalization
BatchNorm normalizes activations (needs batch statistics); WeightNorm normalizes weights directly (no batch dependency).