Question 1

What is Residual Connection?

Accepted Answer

Residual connections add a layer's input to its output, allowing gradients to flow directly through deep networks. Formula: output = Layer(x) + x. The addition creates a gradient "shortcut." Without residual connections, deep networks (50+ layers) suffer from vanishing gradients. In Transformers: after every attention and FFN layer, combined with layer normalization.

Question 2

How does Residual Connection work?

Accepted Answer

Formula: output = Layer(x) + x. The addition creates a gradient "shortcut." Without residual connections, deep networks (50+ layers) suffer from vanishing gradients. In Transformers: after every attention and FFN layer, combined with layer normalization.

Question 3

Why is Residual Connection important for marketing?

Accepted Answer

Without residual connections, neither deep CNNs (ResNet) nor Transformers with 100+ layers would be trainable.

Question 4

What are common mistakes with Residual Connection?

Accepted Answer

Dimensions must match (or projection needed). Combination with normalization critical (Pre-LN vs Post-LN). Can limit feature reuse.

Question 5

Where does Residual Connection come from?

Accepted Answer

He et al. (Microsoft, 2015) introduced residual connections in ResNet and won ImageNet. The Transformer paper (2017) adopted the concept as "Add & Norm" after each sub-layer. Standard in every deep learning architecture today.

Question 6

What is the difference between Residual Connection and Transformer?

Accepted Answer

Residual Connection and Transformer are related concepts in AI and marketing. Residual connections add a layer's input to its output, allowing gradients to flow directly through ...

Residual Connection

Explanation

Marketing Relevance

Common Pitfalls

Origin & History

Comparisons & Differences

Residual Connection vs. Dense Connections (DenseNet)

Further Resources

Related Services

Related Terms