Sinusoidal Positional Encoding
The original positional encoding from the Transformer paper using sine and cosine functions of different frequencies.
Sinusoidal encoding uses sin/cos waves of different frequencies as position signal – the historically first solution from the Transformer paper (2017).
Explanation
PE(pos, 2i) = sin(pos/10000^(2i/d)), PE(pos, 2i+1) = cos(pos/10000^(2i/d)). Different dimensions have different wavelengths (2π to 10000·2π). Advantage: Can theoretically generalize to arbitrary lengths as it's deterministic.
Marketing Relevance
Historically important as the first solution for position information in Transformers – today mostly replaced by RoPE or learned embeddings.
Common Pitfalls
Does not generalize well to unseen lengths in practice. Absolute position information instead of relative. Modern LLMs use RoPE instead of sinusoidal.
Origin & History
Vaswani et al. (2017) chose sinusoidal encoding for its ability to represent relative positions through linear transformation. BERT (2018) replaced it with learned positional embeddings. RoPE (2021) and ALiBi (2022) superseded both.
Comparisons & Differences
Sinusoidal Positional Encoding vs. Learned Positional Embeddings
Sinusoidal is deterministic (no parameters); learned embeddings are trained – more flexible but limited to training length.
Sinusoidal Positional Encoding vs. RoPE
Sinusoidal adds position to embedding; RoPE rotates Q/K vectors – captures relative positions better and scales with techniques like YaRN.