LSTM (Long Short-Term Memory)
LSTM is an RNN variant with gate mechanisms (forget, input, output gate) enabling learning of long-term dependencies in sequences.
LSTMs solved the vanishing gradient problem of RNNs with gate mechanisms ā the dominant sequence architecture before Transformers.
Explanation
The gates control which information is retained, added, or output. This solves the vanishing gradient problem of vanilla RNNs. LSTMs dominated language processing from 2014-2017, until Transformers replaced them.
Marketing Relevance
Historically central for NLP and time series. Understanding helps explain the Transformer advantage.
Origin & History
Hochreiter & Schmidhuber (1997) invented LSTM. It took until around 2014 for LSTMs to become standard for NLP, translation, and speech recognition through GPU training. Google Translate used an LSTM system in 2016. Transformers (2017) replaced LSTMs for most tasks.
Comparisons & Differences
LSTM (Long Short-Term Memory) vs. GRU
LSTM has 3 gates (more complex, more expressive); GRU has 2 gates (simpler, faster, similar performance).
LSTM (Long Short-Term Memory) vs. Transformer
LSTM processes sequentially (O(n)); Transformer in parallel with attention (O(1) depth but O(n²) attention). Transformers scale better.