LSTM
Overview
From-scratch LSTM implementation, manually implementing all four gates (input, forget, output, cell) without using nn.LSTM. LSTMs solve the vanishing gradient problem of vanilla RNNs by introducing a cell state that can carry information over long sequences. Based on Long Short-Term Memory (Hochreiter & Schmidhuber, 1997).
Architecture
- Manual gate implementations: i, f, g, o
- 128 hidden units per layer
- Sequence length: 64
- ~128K parameters
Training
| Hyperparameter | Value |
|---|---|
| Epochs | 50 |
| Optimizer | Adam, lr=1e-4 |
| Batch size | 32 |
| Dropout | 0.1 |
Results
| Split | Loss |
|---|---|
| Train | 0.49 |
| Validation | 0.48 |
Paper
Long Short-Term Memory — Hochreiter & Schmidhuber, 1997