Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    RetNet (Retentive Network)

    Also known as:
    Retentive Network
    Retention Network
    Updated: 2/11/2026

    An architecture from Microsoft combining Transformer quality with linear inference complexity through a "retention" mechanism.

    Quick Summary

    RetNet offers three compute modes (parallel, recurrent, chunk-wise) and achieves Transformer quality with O(1) inference per token.

    Explanation

    RetNet offers three compute modes: parallel training (like Transformer), recurrent inference (O(1) per token, like RNN), and chunk-wise processing (hybrid). The retention mechanism replaces softmax attention with exponentially weighted sums.

    Marketing Relevance

    RetNet promises "the impossible": Transformer-quality training with O(1) inference – but not yet validated in large production models.

    Common Pitfalls

    No large production models yet. Quality claims not independently replicated. More complex implementation than standard Transformer.

    Origin & History

    Sun et al. (Microsoft Research, 2023) introduced RetNet. The paper showed promising results at 6.7B parameters. However, no adoption in large open-source or commercial models so far.

    Comparisons & Differences

    RetNet (Retentive Network) vs. Transformer

    Transformer: O(N) inference memory (KV cache); RetNet: O(1) inference memory through recurrent mode.

    RetNet (Retentive Network) vs. Mamba

    Mamba uses selective SSMs; RetNet uses retention (exponentially weighted sums) – different approaches for linear inference.

    Related Services

    Related Terms

    👋Questions? Chat with us!