GPT
Overview
From-scratch PyTorch replication of the original GPT architecture — a decoder-only transformer trained autoregressively on the TinyShakespeare character dataset. Based on the paper Improving Language Understanding by Generative Pre-Training (Radford et al., OpenAI 2018).
Architecture
Standard decoder-only transformer stack: causal self-attention, feed-forward sublayers, layer norm, and learned positional embeddings. Trained autoregressively with a cross-entropy language modelling objective on character-level tokens.
Training
- Dataset: TinyShakespeare (
/datafolder) - Objective: Next-token prediction (causal LM)
- Framework: PyTorch
Paper
Improving Language Understanding by Generative Pre-Training — Radford et al., OpenAI 2018