ELU (Exponential Linear Unit)
An activation function that exponentially dampens negative values toward a negative saturation value – smoother than ReLU with zero-mean outputs.
ELU dampens negative values exponentially instead of cutting them – smoother than ReLU with natural zero-mean outputs.
Explanation
ELU: f(x) = x for x > 0, f(x) = α(eˣ - 1) for x ≤ 0. The exponential part ensures smooth gradients and zero-mean outputs. Slightly more expensive than ReLU due to exponential computation.
Marketing Relevance
ELU showed that zero-mean activations can partially replace batch normalization.
Origin & History
Clevert et al. (2015) introduced ELU and showed faster convergence than ReLU. SELU (2017) extended ELU with self-normalizing properties.
Comparisons & Differences
ELU (Exponential Linear Unit) vs. ReLU
ReLU: non-smooth at 0, not zero-mean; ELU: smooth, zero-mean, but more expensive due to exponential.
ELU (Exponential Linear Unit) vs. SELU
ELU needs external normalization; SELU self-normalizes through special α/λ values – but needs specific initialization.