Recurrent Neural Network (RNN)
RNNs process sequences by passing a hidden state across timesteps – the original architecture for language and time series, now largely replaced by Transformers.
RNNs process sequences with hidden state – the predecessor architecture to Transformers, now largely replaced by attention mechanisms.
Explanation
At each timestep, the RNN takes the current input and previous hidden state to compute a new state. Problems: vanishing gradients for long sequences, no parallelization possible. LSTMs and GRUs improved RNNs, but Transformers surpassed them.
Marketing Relevance
RNNs are historically important and still relevant in niches (small devices, real-time streaming). Understanding explains why Transformers are superior.
Origin & History
Elman networks (1990) and Jordan networks were early RNNs. LSTMs (Hochreiter & Schmidhuber, 1997) solved the vanishing gradient problem. GRUs (Cho et al., 2014) simplified LSTMs. Seq2Seq with attention (Bahdanau, 2014) was the transition. Transformers (2017) made RNNs obsolete for most tasks.
Comparisons & Differences
Recurrent Neural Network (RNN) vs. Transformer
RNNs process sequentially (slow, vanishing gradients); Transformers process in parallel with attention (faster, better long-range dependencies).
Recurrent Neural Network (RNN) vs. LSTM
Vanilla RNN has simple hidden state; LSTM has gates (forget, input, output) for better long-term dependencies.