State Space Model (SSM)
A class of sequence models based on continuous state space theory offering linear scaling O(N) instead of quadratic attention O(N²).
State space models model sequences as a dynamical system with O(N) scaling – the theoretical basis for Mamba and Transformer alternatives.
Explanation
SSMs model sequences as a linear dynamical system: x'(t) = Ax(t) + Bu(t), y(t) = Cx(t). Through discretization and special parameterization (HiPPO, S4), they can efficiently capture long dependencies. Mamba extends this with selective mechanisms.
Marketing Relevance
SSMs are the most promising Transformer alternative for tasks with extremely long sequences (audio, genomics, time series).
Common Pitfalls
Not yet fully at Transformer parity for language tasks. Less mature tooling and community. Training instabilities with naive implementation.
Origin & History
Gu et al. introduced HiPPO (2020) and S4 (2021). S4 first showed state-of-the-art on long-range benchmarks. Mamba (2023) made SSMs competitive for language through selective mechanisms. Mamba-2 and Jamba (2024) approached Transformer quality.
Comparisons & Differences
State Space Model (SSM) vs. Transformer
Transformers use attention (O(N²), strong quality); SSMs use recurrence (O(N), more efficient for long sequences but quality gap).
State Space Model (SSM) vs. RNN/LSTM
RNNs have vanishing gradient; SSMs solve this through HiPPO initialization and can be trained in parallel (as convolution).