Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    LSTM (Long Short-Term Memory)

    Also known as:
    LSTM
    Long Short-Term Memory
    Updated: 2/9/2026

    LSTM is an RNN variant with gate mechanisms (forget, input, output gate) enabling learning of long-term dependencies in sequences.

    Quick Summary

    LSTMs solved the vanishing gradient problem of RNNs with gate mechanisms – the dominant sequence architecture before Transformers.

    Explanation

    The gates control which information is retained, added, or output. This solves the vanishing gradient problem of vanilla RNNs. LSTMs dominated language processing from 2014-2017, until Transformers replaced them.

    Marketing Relevance

    Historically central for NLP and time series. Understanding helps explain the Transformer advantage.

    Origin & History

    Hochreiter & Schmidhuber (1997) invented LSTM. It took until around 2014 for LSTMs to become standard for NLP, translation, and speech recognition through GPU training. Google Translate used an LSTM system in 2016. Transformers (2017) replaced LSTMs for most tasks.

    Comparisons & Differences

    LSTM (Long Short-Term Memory) vs. GRU

    LSTM has 3 gates (more complex, more expressive); GRU has 2 gates (simpler, faster, similar performance).

    LSTM (Long Short-Term Memory) vs. Transformer

    LSTM processes sequentially (O(n)); Transformer in parallel with attention (O(1) depth but O(n²) attention). Transformers scale better.

    Related Services

    Related Terms

    šŸ‘‹Questions? Chat with us!