Weight Initialization
Weight initialization determines the starting values of network parameters – critical for stable training and fast convergence.
Weight initialization sets neural network starting values – Xavier for Sigmoid/Tanh, He/Kaiming for ReLU, crucial for stable training.
Explanation
Xavier/Glorot init (2010) for Sigmoid/Tanh, He/Kaiming init (2015) for ReLU. Wrong initialization leads to vanishing/exploding gradients from the start. Modern frameworks automatically choose the right method.
Marketing Relevance
Correct initialization is a prerequisite for training – an often underestimated hyperparameter.
Origin & History
Xavier/Glorot initialization (2010) solved training issues with Sigmoid/Tanh. He/Kaiming initialization (2015) was developed for ReLU networks. Fixup init (2019) enabled training without normalization. Modern transformers use special init strategies (μP, 2022).
Comparisons & Differences
Weight Initialization vs. Xavier vs He Init
Xavier for symmetric activations (Sigmoid/Tanh); He for ReLU (accounts for ReLU cutting off the negative half).