Tanh (Hyperbolic Tangent)
An activation function that maps values to the range [-1, 1] – zero-centered and smoother than sigmoid.
Tanh maps values to [-1, 1] – zero-centered like ReLU, but smoother. Standard in LSTM/GRU gates, replaced by ReLU in feed-forward networks.
Explanation
Tanh is a scaled sigmoid: tanh(x) = 2σ(2x) - 1. The zero-centered property improves gradient flow compared to sigmoid.
Marketing Relevance
Tanh was long the standard in RNNs and LSTMs. Replaced by ReLU/GELU in modern architectures, but still relevant in certain contexts (e.g., gate functions).
Common Pitfalls
Vanishing gradient problem at extreme values. More computationally expensive than ReLU. Saturation at |x| > 3.
Origin & History
Tanh became popular as an improvement over sigmoid in the 1990s (LeCun, 1998). The zero-centered property improved convergence. With the rise of ReLU (2010), importance decreased, but tanh remains standard in LSTM/GRU gates.
Comparisons & Differences
Tanh (Hyperbolic Tangent) vs. Sigmoid
Sigmoid maps to [0, 1] (not zero-centered); Tanh to [-1, 1] (zero-centered) – Tanh often converges faster.
Tanh (Hyperbolic Tangent) vs. ReLU
ReLU is faster to compute and avoids vanishing gradients for positive values. Tanh is smoother but saturates at extreme inputs.