Skip Connection
Skip connections forward the input of a layer directly to the output of later layers – the core mechanism making 100+ layer deep networks trainable.
Skip connections forward inputs directly to later layers – the innovation behind ResNet and Transformer that made 100+ layer deep networks trainable.
Explanation
Instead of learning y = F(x), the network learns y = F(x) + x (residual learning). The identity connection enables unimpeded gradient flow and solves the vanishing gradient problem. Every modern transformer uses skip connections.
Marketing Relevance
Without skip connections, neither ResNets nor Transformers would be possible – one of the most important innovations in deep learning.
Origin & History
He et al. (2015) introduced residual learning with ResNet, winning ImageNet 2015. The idea that "identity is easier to learn than a new function" revolutionized deep learning. Transformers (2017) adopted skip connections as a core component. DenseNet (2017) extended the concept with dense connections.
Comparisons & Differences
Skip Connection vs. DenseNet
ResNet adds input (y = F(x) + x); DenseNet concatenates all previous outputs (denser information flow but more memory).