LoRA

Fine-tuning PyTorch TinyShakespeare
GitHub →

Overview

From-scratch PyTorch implementation of LoRA (Low-Rank Adaptation). Rather than fine-tuning all parameters, LoRA freezes the pre-trained weights and injects trainable low-rank decomposition matrices (A and B) into each attention projection. The update ΔW = BA where B ∈ Rᵈˣʳ and A ∈ Rʳˣᵏ, with rank r ≪ d. Based on LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2022).

Architecture

  • LoRA adapters injected into Q, V projections
  • Rank r configurable; original weights frozen
  • Zero-initialised B, random Gaussian A — ensures ΔW=0 at start

Training

Hyperparameter Value
Dataset TinyShakespeare
Steps 1,000 (val every 100)
Hardware A100 GPU

Results

Split Loss
Train 3.51
Validation 3.50

Paper

LoRA: Low-Rank Adaptation of Large Language Models — Hu et al., 2022