GRU
Overview
From-scratch GRU (Gated Recurrent Unit) implementation. GRU simplifies the LSTM by merging the cell and hidden state into one and using just two gates (reset and update), achieving comparable performance with fewer parameters. Based on Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (Cho et al., 2014).
Architecture
- Manual gate implementations: reset gate r, update gate z
- New hidden state: h̃ = tanh(Wx + r ⊙ Uh_{t-1})
- 16 hidden units per layer
- Sequence length: 16
Training
| Hyperparameter | Value |
|---|---|
| Epochs | 50 |
| Optimizer | Adam, lr=1e-4 |
| Batch size | 16 |
| Dropout | 0.2 |
Results
| Split | Loss |
|---|---|
| Train | 0.51 |
| Validation | 0.48 |
Paper
Learning Phrase Representations using RNN Encoder-Decoder — Cho et al., 2014