RetNet (Retentive Network)
An architecture from Microsoft combining Transformer quality with linear inference complexity through a "retention" mechanism.
RetNet offers three compute modes (parallel, recurrent, chunk-wise) and achieves Transformer quality with O(1) inference per token.
Explanation
RetNet offers three compute modes: parallel training (like Transformer), recurrent inference (O(1) per token, like RNN), and chunk-wise processing (hybrid). The retention mechanism replaces softmax attention with exponentially weighted sums.
Marketing Relevance
RetNet promises "the impossible": Transformer-quality training with O(1) inference – but not yet validated in large production models.
Common Pitfalls
No large production models yet. Quality claims not independently replicated. More complex implementation than standard Transformer.
Origin & History
Sun et al. (Microsoft Research, 2023) introduced RetNet. The paper showed promising results at 6.7B parameters. However, no adoption in large open-source or commercial models so far.
Comparisons & Differences
RetNet (Retentive Network) vs. Transformer
Transformer: O(N) inference memory (KV cache); RetNet: O(1) inference memory through recurrent mode.
RetNet (Retentive Network) vs. Mamba
Mamba uses selective SSMs; RetNet uses retention (exponentially weighted sums) – different approaches for linear inference.