Sequence-to-Sequence
A model architecture that transforms an input sequence into an output sequence of variable length.
Seq2Seq transforms input sequences into output sequences – the architecture behind translation, summarization, and T5.
Explanation
Seq2Seq consists of an encoder (understands input) and a decoder (generates output). Originally with RNNs, today mostly with transformers.
Marketing Relevance
Seq2Seq is the architecture behind machine translation, summarization, chatbots, and many NLP generation tasks.
Example
T5 (Text-to-Text Transfer Transformer) treats all NLP tasks as Seq2Seq: input text → output text.
Common Pitfalls
Information bottleneck in fixed-size encoder state (solved by attention). Exposure bias during training. Weaknesses with very long sequences.
Origin & History
Sutskever et al. (Google, 2014) published the first Seq2Seq paper for machine translation. Bahdanau (2015) added attention. The Transformer (2017) replaced RNNs. T5 (2020) unified all NLP tasks as text-to-text Seq2Seq.
Comparisons & Differences
Sequence-to-Sequence vs. Decoder-Only (GPT)
Seq2Seq has encoder + decoder (good for transformation). Decoder-only models (GPT) have only the decoder (good for open generation).
Sequence-to-Sequence vs. Encoder-Only (BERT)
BERT has only the encoder (good for understanding/classification). Seq2Seq has both and can generate.