Encoder-Decoder
Overview
From-scratch LSTM-based encoder-decoder (Seq2Seq) for German-to-English translation, replicating the architecture from Sequence to Sequence Learning with Neural Networks (Sutskever et al., 2014). This predates attention — the full encoder hidden state is compressed into a single context vector passed to the decoder.
Architecture
- Deep LSTM encoder and decoder (4 layers each)
- 128 hidden units per layer
- 32-token block size
- No attention — fixed-length context vector bottleneck
Training
| Hyperparameter | Value |
|---|---|
| Dataset | Multi30k-style German–English |
| Epochs | 10 |
| Optimizer | Adam, lr=1e-4 |
| Batch size | 32 |
| Dropout | 0.2 |
Results
| Split | Loss |
|---|---|
| Train | 1.38 |
| Validation | 1.39 |
Paper
Sequence to Sequence Learning with Neural Networks — Sutskever et al., 2014