Question 1

What is Feed-Forward Network (FFN)?

Accepted Answer

In the Transformer context: a two-layer MLP applied independently to each position after the attention layer. FFN(x) = GELU(xW₁ + b₁)W₂ + b₂. Inner dimension is typically 4x model dimension (e.g., d_model=4096 → d_ff=16384). This is where "knowledge is stored" – attention finds relevant info, FFN processes it. SwiGLU in modern LLMs (LLaMA) replaces GELU.

Question 2

How does Feed-Forward Network (FFN) work?

Accepted Answer

FFN(x) = GELU(xW₁ + b₁)W₂ + b₂. Inner dimension is typically 4x model dimension (e.g., d_model=4096 → d_ff=16384). This is where "knowledge is stored" – attention finds relevant info, FFN processes it. SwiGLU in modern LLMs (LLaMA) replaces GELU.

Question 3

Why is Feed-Forward Network (FFN) important for marketing?

Accepted Answer

FFN parameters make up ~2/3 of Transformer parameters – this is where most "knowledge" is stored.

Question 4

What are common mistakes with Feed-Forward Network (FFN)?

Accepted Answer

FFN expansion ratio (4x) uses most parameters. SwiGLU needs 8/3x instead of 4x. MoE optimizes FFN through sparse routing.

Question 5

Where does Feed-Forward Network (FFN) come from?

Accepted Answer

Position-wise FFN was part of the original Transformer (2017). GPT and BERT used GELU instead of ReLU. LLaMA (2023) introduced SwiGLU activation which became the norm in modern LLMs. MoE models (Mixtral, GPT-4) make FFN sparse.

Question 6

What is the difference between Feed-Forward Network (FFN) and Transformer?

Accepted Answer

Feed-Forward Network (FFN) and Transformer are related concepts in AI and marketing. In the Transformer context: a two-layer MLP applied independently to each position after the attenti...

Feed-Forward Network (FFN)

Explanation

Marketing Relevance

Common Pitfalls

Origin & History

Comparisons & Differences

Feed-Forward Network (FFN) vs. Mixture of Experts (MoE)

Further Resources

Related Services

Related Terms