ALiBi (Attention with Linear Biases)
A method for position encoding that adds linear biases directly to attention scores instead of learning position embeddings.
ALiBi encodes position through linear attention biases – no learned parameters, natural extrapolation to longer contexts.
Explanation
ALiBi adds a negative linear bias proportional to the distance between query and key positions. The further apart, the more attention is dampened. Requires no learned parameters and naturally extrapolates to longer contexts than seen in training.
Marketing Relevance
ALiBi was one of the first efficient extrapolation methods and is used in BLOOM and MPT.
Common Pitfalls
Less common than RoPE in newer models. Linear bias assumption can be suboptimal for very long contexts.
Origin & History
Press et al. (2021) introduced ALiBi and showed strong extrapolation without training on long contexts. BLOOM (BigScience, 2022) and MPT (MosaicML, 2023) used ALiBi. RoPE has largely superseded ALiBi in newer models (Llama, Mistral).
Comparisons & Differences
ALiBi (Attention with Linear Biases) vs. RoPE
RoPE rotates Q/K vectors (modifies representations); ALiBi adds bias to scores (modifies attention weights) – RoPE dominates in newer LLMs.
ALiBi (Attention with Linear Biases) vs. Sinusoidal Positional Encoding
Sinusoidal adds embeddings to input; ALiBi modifies attention scores directly – no additional memory.