Leaky ReLU
A variant of ReLU that lets negative values pass with a small factor (e.g., 0.01) instead of setting them to 0 – prevents the dead neuron problem.
Leaky ReLU passes negative values with a small factor (instead of 0) – prevents dead neurons with minimal overhead.
Explanation
Leaky ReLU: f(x) = x for x > 0, f(x) = αx for x ≤ 0 (typically α = 0.01). The small negative gradient ensures neurons can never fully "die" as with standard ReLU. Simple to implement, minimal computational overhead.
Marketing Relevance
Important improvement over ReLU in GANs and deep networks where dead neurons are a common problem.
Common Pitfalls
The leak factor α must be chosen. Not always better than standard ReLU. GELU/SwiGLU preferred in Transformers.
Origin & History
Maas et al. (2013) introduced Leaky ReLU. It became especially popular in GANs (DCGAN, 2015), where dead neurons destabilize training. PReLU (He et al., 2015) made the leak factor learnable.
Comparisons & Differences
Leaky ReLU vs. ReLU
ReLU sets negative values to 0 (dead neurons possible); Leaky ReLU multiplies by small α (all neurons stay active).
Leaky ReLU vs. PReLU
Leaky ReLU has fixed leak factor (e.g., 0.01); PReLU learns the optimal factor per channel from data.