Hyena
A subquadratic attention replacement based on long convolutions and data-controlled gates, scaling O(N log N) instead of O(N²).
Hyena uses long convolutions + data-controlled gates as O(N log N) attention alternative – strong for DNA and ultra-long sequences.
Explanation
Hyena replaces attention with implicitly parameterized long convolutions computed efficiently with FFT. Data-controlled gates (learned from input) enable context-dependent processing similar to attention.
Marketing Relevance
Hyena shows promising results for DNA sequences (HyenaDNA) and other ultra-long sequences.
Common Pitfalls
Not yet at Transformer level for language tasks. FFT-based implementation can be inefficient on certain hardware.
Origin & History
Poli et al. (Stanford, 2023) introduced the Hyena operator. HyenaDNA (2023) showed state-of-the-art on genomics tasks with 1M+ token contexts. Together AI integrated Hyena into their model suite.
Comparisons & Differences
Hyena vs. Mamba
Mamba uses selective SSMs (O(N)); Hyena uses FFT-based convolutions (O(N log N)) – Mamba is better for language, Hyena for genomics.