LSTM

Sequential Models PyTorch Custom

Overview

From-scratch LSTM implementation, manually implementing all four gates (input, forget, output, cell) without using nn.LSTM. LSTMs solve the vanishing gradient problem of vanilla RNNs by introducing a cell state that can carry information over long sequences. Based on Long Short-Term Memory (Hochreiter & Schmidhuber, 1997).

Architecture

Manual gate implementations: i, f, g, o
128 hidden units per layer
Sequence length: 64
~128K parameters

Training

Hyperparameter	Value
Epochs	50
Optimizer	Adam, lr=1e-4
Batch size	32
Dropout	0.1

Results

Split	Loss
Train	0.49
Validation	0.48

Paper

Long Short-Term Memory — Hochreiter & Schmidhuber, 1997

Yuvraj Singh

Overview

Architecture

Training

Results

Paper