VAE

Generative Models PyTorch CelebA

Overview

From-scratch Variational Autoencoder trained on CelebA face images at 128×128 resolution. Demonstrates reconstruction, novel face generation by sampling from the Gaussian prior, and latent space arithmetic (e.g. adding/subtracting attribute directions). Based on Auto-Encoding Variational Bayes (Kingma & Welling, 2014).

Architecture

Encoder: 4× Conv2d (3→128→256→256→256, stride=2) → linear → μ and log σ² (32D latent)

Decoder: Linear → 4× ConvTranspose2d → 128×128 RGB image

Reparameterisation trick for differentiable sampling
Loss: MSE reconstruction + KL divergence
Activation: LeakyReLU (slope=0.01)
WandB tracking

Training

Hyperparameter	Value
Dataset	CelebA (202,599 images, 80/20 split)
Epochs	200 (checkpoint at epoch 240)
Optimizer	Adam, lr=5e-4
Batch size	32

Paper

Auto-Encoding Variational Bayes — Kingma & Welling, 2014

Yuvraj Singh

Overview

Architecture

Training

Paper