Griffin (Google)
Google's hybrid architecture combining linear recurrences (gated RNN) with local attention, productionized in RecurrentGemma.
Griffin combines gated linear recurrence with local attention – Google's hybrid architecture, productionized as RecurrentGemma.
Explanation
Griffin uses Real-Gated Linear Recurrence Units (RG-LRU) as an efficient recurrence layer combined with local sliding window attention. RecurrentGemma (2B/9B) shows this hybrid architecture can achieve Transformer quality with significantly less inference memory.
Marketing Relevance
Griffin/RecurrentGemma is the first Transformer alternative from Google in production – a signal for the future of hybrid architectures.
Common Pitfalls
Only validated in small models (2B/9B). Little community adoption. Not used for Gemini internally at Google.
Origin & History
De et al. (Google DeepMind, 2024) introduced Griffin and the Hawk baseline. RecurrentGemma (2024) made Griffin available as an open-source model. Showed competitive results against Gemma at significantly lower inference cost.
Comparisons & Differences
Griffin (Google) vs. Jamba
Jamba uses Mamba SSM + Attention; Griffin uses gated linear recurrence + local attention – different recurrence mechanisms.
Griffin (Google) vs. Gemma
Gemma is pure Transformer; Griffin/RecurrentGemma partially replaces global attention with recurrence for better inference efficiency.