Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Jamba

    Updated: 2/11/2026

    AI21 Labs' hybrid architecture combining Transformer attention with Mamba SSM layers and MoE for efficient long contexts.

    Quick Summary

    Jamba is AI21 Labs' hybrid of Transformer + Mamba + MoE – 256K context with 3x less KV cache than pure Transformers.

    Explanation

    Jamba interleaves Transformer blocks (with attention) and Mamba blocks (with SSM). MoE is used in both block types. Result: 256K context with 3x less KV cache than comparable Transformers. 52B total parameters, 12B active.

    Marketing Relevance

    Jamba shows that hybrid architectures (Attention + SSM) can combine the strengths of both approaches.

    Common Pitfalls

    More complex architecture hinders community adoption. Only trained by AI21 Labs. Optimal ratio of Attention:Mamba blocks still unclear.

    Origin & History

    AI21 Labs released Jamba in March 2024 as the first production-ready Mamba hybrid model. Jamba 1.5 (2024) extended to 256K context and showed competitive performance against Llama 3 70B.

    Comparisons & Differences

    Jamba vs. Llama 3

    Llama 3 is pure Transformer (large KV cache); Jamba uses SSM blocks for drastically smaller cache at comparable quality.

    Jamba vs. Mamba

    Pure Mamba lacks attention for in-context learning; Jamba uses strategically placed attention blocks for better reasoning ability.

    Related Services

    Related Terms

    👋Questions? Chat with us!