Jamba
AI21 Labs' hybrid architecture combining Transformer attention with Mamba SSM layers and MoE for efficient long contexts.
Jamba is AI21 Labs' hybrid of Transformer + Mamba + MoE – 256K context with 3x less KV cache than pure Transformers.
Explanation
Jamba interleaves Transformer blocks (with attention) and Mamba blocks (with SSM). MoE is used in both block types. Result: 256K context with 3x less KV cache than comparable Transformers. 52B total parameters, 12B active.
Marketing Relevance
Jamba shows that hybrid architectures (Attention + SSM) can combine the strengths of both approaches.
Common Pitfalls
More complex architecture hinders community adoption. Only trained by AI21 Labs. Optimal ratio of Attention:Mamba blocks still unclear.
Origin & History
AI21 Labs released Jamba in March 2024 as the first production-ready Mamba hybrid model. Jamba 1.5 (2024) extended to 256K context and showed competitive performance against Llama 3 70B.
Comparisons & Differences
Jamba vs. Llama 3
Llama 3 is pure Transformer (large KV cache); Jamba uses SSM blocks for drastically smaller cache at comparable quality.
Jamba vs. Mamba
Pure Mamba lacks attention for in-context learning; Jamba uses strategically placed attention blocks for better reasoning ability.