Encoder-Decoder

Sequential Models PyTorch Multi30k (German–English)

Overview

From-scratch LSTM-based encoder-decoder (Seq2Seq) for German-to-English translation, replicating the architecture from Sequence to Sequence Learning with Neural Networks (Sutskever et al., 2014). This predates attention — the full encoder hidden state is compressed into a single context vector passed to the decoder.

Architecture

Deep LSTM encoder and decoder (4 layers each)
128 hidden units per layer
32-token block size
No attention — fixed-length context vector bottleneck

Training

Hyperparameter	Value
Dataset	Multi30k-style German–English
Epochs	10
Optimizer	Adam, lr=1e-4
Batch size	32
Dropout	0.2

Results

Split	Loss
Train	1.38
Validation	1.39

Paper

Sequence to Sequence Learning with Neural Networks — Sutskever et al., 2014

Yuvraj Singh

Overview

Architecture

Training

Results

Paper