RWKV (Receptance Weighted Key Value)
An open-source architecture combining RNN efficiency (O(1) inference per token) with Transformer-like parallelizability during training.
RWKV combines RNN inference (O(1) per token, no KV cache) with Transformer training – open-source alternative up to 14B parameters.
Explanation
RWKV replaces attention with a WKV mechanism (weighted key-value aggregation with exponential decay). Training is computed in parallel (like Transformer), inference is recurrent (like RNN). Models up to 14B parameters are available.
Marketing Relevance
RWKV is the only community-driven Transformer alternative with large trained models and active development.
Common Pitfalls
Quality gap to same-size Transformer models for complex reasoning. Smaller community and less tooling.
Origin & History
Bo Peng developed RWKV from 2022 as a community project. RWKV-4 (2023) showed competitive results. RWKV-5 "Eagle" and RWKV-6 "Finch" (2024) further improved quality. The RWKV Foundation coordinates open-source development.
Comparisons & Differences
RWKV (Receptance Weighted Key Value) vs. Transformer
Transformers need KV cache (O(N) memory); RWKV needs only fixed state (O(1)) – significantly more memory-efficient at inference.
RWKV (Receptance Weighted Key Value) vs. Mamba
Mamba uses selective SSMs; RWKV uses linear attention with WKV – Mamba has more academic validation, RWKV has more trained models.