LSTM

Sequential Models PyTorch Custom
GitHub →

Overview

From-scratch LSTM implementation, manually implementing all four gates (input, forget, output, cell) without using nn.LSTM. LSTMs solve the vanishing gradient problem of vanilla RNNs by introducing a cell state that can carry information over long sequences. Based on Long Short-Term Memory (Hochreiter & Schmidhuber, 1997).

Architecture

  • Manual gate implementations: i, f, g, o
  • 128 hidden units per layer
  • Sequence length: 64
  • ~128K parameters

Training

Hyperparameter Value
Epochs 50
Optimizer Adam, lr=1e-4
Batch size 32
Dropout 0.1

Results

Split Loss
Train 0.49
Validation 0.48

Paper

Long Short-Term Memory — Hochreiter & Schmidhuber, 1997