Seq2Seq

Sequential Models PyTorch Custom
GitHub →

Overview

From-scratch GRU-based Seq2Seq with attention, implementing both Bahdanau (additive) and Luong (multiplicative) attention variants. Extends the vanilla encoder-decoder by letting the decoder dynamically attend to all encoder hidden states at each step, eliminating the fixed-length bottleneck. Based on Sequence to Sequence Learning with Neural Networks (Sutskever et al., 2014) plus the attention papers from Bahdanau and Luong.

Architecture

  • GRU encoder and decoder
  • 128 hidden units per GRU layer
  • FFN hidden = 4× embedding dim
  • Sequence length: 32
  • Attention: Bahdanau (additive) and Luong (dot/general) — both implemented

Training

Hyperparameter Value
Epochs 50
Optimizer Adam, lr=1e-4
Batch size 32
Dropout 0.1

Papers