SELU (Scaled Exponential Linear Unit)
A self-normalizing activation function that automatically centers outputs to mean 0 and variance 1 – no batch/layer norm needed.
SELU self-normalizes through special scaling – no batch/layer norm needed, but strict architecture requirements.
Explanation
SELU = λ · ELU(x, α) with mathematically derived constants (λ ≈ 1.0507, α ≈ 1.6733). Requires LeCun initialization and dropout variant (Alpha Dropout). Theoretically elegant but often hard to apply to all architectures in practice.
Marketing Relevance
Showed that normalization can be built into the activation function – inspired research on norm-free architectures.
Origin & History
Klambauer et al. (2017) mathematically proved that SELU networks are self-normalizing. The paper gained attention, but practical limitations (no convolutions, special initialization) limited adoption.
Comparisons & Differences
SELU (Scaled Exponential Linear Unit) vs. ELU
ELU alone doesn't normalize; SELU scales ELU so that outputs automatically stay normalized.