Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    ReLU (Rectified Linear Unit)

    Also known as:
    ReLU
    Rectified Linear Unit
    Rectifier
    Updated: 2/9/2026

    ReLU is the most used activation function in deep learning: f(x) = max(0, x) – simple, fast, and effective against vanishing gradients.

    Quick Summary

    ReLU = max(0, x) – the simplest and most used activation function that made deep learning possible by avoiding vanishing gradients.

    Explanation

    ReLU passes positive values unchanged and sets negatives to 0. This avoids vanishing gradients (unlike Sigmoid/Tanh) and accelerates training. Variants: Leaky ReLU, PReLU, GELU, SiLU/Swish.

    Marketing Relevance

    ReLU was key to deep learning's success – without ReLU, deep networks would not have been trainable.

    Origin & History

    ReLU was described as early as the 1960s, but Nair & Hinton (2010) first showed its superiority for deep networks. AlexNet (2012) used ReLU for the ImageNet breakthrough. GELU (Hendrycks, 2016) and SiLU/Swish (2017) are smoother variants that became standard in transformers (GPT, BERT).

    Comparisons & Differences

    ReLU (Rectified Linear Unit) vs. Sigmoid

    ReLU: no vanishing gradient, fast, but "dead neurons" possible. Sigmoid: smooth 0-1 output, but saturates in deep nets.

    ReLU (Rectified Linear Unit) vs. GELU

    ReLU has a hard kink at 0; GELU is smooth and probabilistic – standard in transformers (GPT, BERT).

    Related Services

    Related Terms

    👋Questions? Chat with us!