Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    SELU (Scaled Exponential Linear Unit)

    Also known as:
    SELU
    Self-Normalizing Activation
    Scaled ELU
    Updated: 2/12/2026

    A self-normalizing activation function that automatically centers outputs to mean 0 and variance 1 – no batch/layer norm needed.

    Quick Summary

    SELU self-normalizes through special scaling – no batch/layer norm needed, but strict architecture requirements.

    Explanation

    SELU = λ · ELU(x, α) with mathematically derived constants (λ ≈ 1.0507, α ≈ 1.6733). Requires LeCun initialization and dropout variant (Alpha Dropout). Theoretically elegant but often hard to apply to all architectures in practice.

    Marketing Relevance

    Showed that normalization can be built into the activation function – inspired research on norm-free architectures.

    Origin & History

    Klambauer et al. (2017) mathematically proved that SELU networks are self-normalizing. The paper gained attention, but practical limitations (no convolutions, special initialization) limited adoption.

    Comparisons & Differences

    SELU (Scaled Exponential Linear Unit) vs. ELU

    ELU alone doesn't normalize; SELU scales ELU so that outputs automatically stay normalized.

    Related Services

    Related Terms

    👋Questions? Chat with us!