Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    ALiBi (Attention with Linear Biases)

    Also known as:
    Attention with Linear Biases
    Linear Position Bias
    Updated: 2/11/2026

    A method for position encoding that adds linear biases directly to attention scores instead of learning position embeddings.

    Quick Summary

    ALiBi encodes position through linear attention biases – no learned parameters, natural extrapolation to longer contexts.

    Explanation

    ALiBi adds a negative linear bias proportional to the distance between query and key positions. The further apart, the more attention is dampened. Requires no learned parameters and naturally extrapolates to longer contexts than seen in training.

    Marketing Relevance

    ALiBi was one of the first efficient extrapolation methods and is used in BLOOM and MPT.

    Common Pitfalls

    Less common than RoPE in newer models. Linear bias assumption can be suboptimal for very long contexts.

    Origin & History

    Press et al. (2021) introduced ALiBi and showed strong extrapolation without training on long contexts. BLOOM (BigScience, 2022) and MPT (MosaicML, 2023) used ALiBi. RoPE has largely superseded ALiBi in newer models (Llama, Mistral).

    Comparisons & Differences

    ALiBi (Attention with Linear Biases) vs. RoPE

    RoPE rotates Q/K vectors (modifies representations); ALiBi adds bias to scores (modifies attention weights) – RoPE dominates in newer LLMs.

    ALiBi (Attention with Linear Biases) vs. Sinusoidal Positional Encoding

    Sinusoidal adds embeddings to input; ALiBi modifies attention scores directly – no additional memory.

    Related Services

    Related Terms

    👋Questions? Chat with us!