Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence
    (Residual Connection (Skip Connection))

    Residual Connection

    Also known as:
    Skip Connection
    Shortcut Connection
    Identity Mapping
    Residual Link
    Updated: 2/10/2026

    Residual connections add a layer's input to its output, allowing gradients to flow directly through deep networks.

    Quick Summary

    Residual connections add input to output (y = f(x) + x) – the trick that makes training deep networks from ResNet to GPT possible.

    Explanation

    Formula: output = Layer(x) + x. The addition creates a gradient "shortcut." Without residual connections, deep networks (50+ layers) suffer from vanishing gradients. In Transformers: after every attention and FFN layer, combined with layer normalization.

    Marketing Relevance

    Without residual connections, neither deep CNNs (ResNet) nor Transformers with 100+ layers would be trainable.

    Common Pitfalls

    Dimensions must match (or projection needed). Combination with normalization critical (Pre-LN vs Post-LN). Can limit feature reuse.

    Origin & History

    He et al. (Microsoft, 2015) introduced residual connections in ResNet and won ImageNet. The Transformer paper (2017) adopted the concept as "Add & Norm" after each sub-layer. Standard in every deep learning architecture today.

    Comparisons & Differences

    Residual Connection vs. Dense Connections (DenseNet)

    Residual adds input once; DenseNet concatenates outputs from all previous layers – more feature reuse but significantly more memory.

    Related Services

    Related Terms

    👋Questions? Chat with us!