Llama

Language Models PyTorch TinyShakespeare
GitHub →

Overview

From-scratch PyTorch replication of the Llama architecture. Llama improved upon vanilla GPT by replacing LayerNorm with RMSNorm, using SwiGLU activations, and adopting Rotary Positional Embeddings (RoPE) — changes that collectively improve training stability and efficiency. Based on LLaMA: Open and Efficient Foundation Language Models (Touvron et al., 2023).

Architecture

  • Norm: RMSNorm (pre-norm)
  • Activations: SwiGLU feed-forward sublayers
  • Position: Rotary Positional Embeddings (RoPE)
  • Attention: Grouped-Query Attention (GQA)
  • Decoder-only autoregressive stack

Training

  • Dataset: TinyShakespeare
  • Objective: Causal language modelling
  • Framework: PyTorch

Paper

LLaMA: Open and Efficient Foundation Language Models — Touvron et al., 2023