Llama

Language Models PyTorch TinyShakespeare

Overview

From-scratch PyTorch replication of the Llama architecture. Llama improved upon vanilla GPT by replacing LayerNorm with RMSNorm, using SwiGLU activations, and adopting Rotary Positional Embeddings (RoPE) — changes that collectively improve training stability and efficiency. Based on LLaMA: Open and Efficient Foundation Language Models (Touvron et al., 2023).

Architecture

Norm: RMSNorm (pre-norm)
Activations: SwiGLU feed-forward sublayers
Position: Rotary Positional Embeddings (RoPE)
Attention: Grouped-Query Attention (GQA)
Decoder-only autoregressive stack

Training

Dataset: TinyShakespeare
Objective: Causal language modelling
Framework: PyTorch

Paper

LLaMA: Open and Efficient Foundation Language Models — Touvron et al., 2023

Yuvraj Singh

Overview

Architecture

Training

Paper