Seq2Seq
Overview
From-scratch GRU-based Seq2Seq with attention, implementing both Bahdanau (additive) and Luong (multiplicative) attention variants. Extends the vanilla encoder-decoder by letting the decoder dynamically attend to all encoder hidden states at each step, eliminating the fixed-length bottleneck. Based on Sequence to Sequence Learning with Neural Networks (Sutskever et al., 2014) plus the attention papers from Bahdanau and Luong.
Architecture
- GRU encoder and decoder
- 128 hidden units per GRU layer
- FFN hidden = 4× embedding dim
- Sequence length: 32
- Attention: Bahdanau (additive) and Luong (dot/general) — both implemented
Training
| Hyperparameter | Value |
|---|---|
| Epochs | 50 |
| Optimizer | Adam, lr=1e-4 |
| Batch size | 32 |
| Dropout | 0.1 |
Papers
- Seq2Seq — Sutskever et al., 2014
- Bahdanau Attention — Bahdanau et al., 2015
- Luong Attention — Luong et al., 2015