GPT

Language Models PyTorch TinyShakespeare

Overview

From-scratch PyTorch replication of the original GPT architecture — a decoder-only transformer trained autoregressively on the TinyShakespeare character dataset. Based on the paper Improving Language Understanding by Generative Pre-Training (Radford et al., OpenAI 2018).

Architecture

Standard decoder-only transformer stack: causal self-attention, feed-forward sublayers, layer norm, and learned positional embeddings. Trained autoregressively with a cross-entropy language modelling objective on character-level tokens.

Training

Dataset: TinyShakespeare (/data folder)
Objective: Next-token prediction (causal LM)
Framework: PyTorch

Paper

Improving Language Understanding by Generative Pre-Training — Radford et al., OpenAI 2018

Yuvraj Singh

Overview

Architecture

Training

Paper