GRU

Sequential Models PyTorch Custom
GitHub →

Overview

From-scratch GRU (Gated Recurrent Unit) implementation. GRU simplifies the LSTM by merging the cell and hidden state into one and using just two gates (reset and update), achieving comparable performance with fewer parameters. Based on Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (Cho et al., 2014).

Architecture

  • Manual gate implementations: reset gate r, update gate z
  • New hidden state: h̃ = tanh(Wx + r ⊙ Uh_{t-1})
  • 16 hidden units per layer
  • Sequence length: 16

Training

Hyperparameter Value
Epochs 50
Optimizer Adam, lr=1e-4
Batch size 16
Dropout 0.2

Results

Split Loss
Train 0.51
Validation 0.48

Paper

Learning Phrase Representations using RNN Encoder-Decoder — Cho et al., 2014