LoRA

Fine-tuning PyTorch TinyShakespeare

Overview

From-scratch PyTorch implementation of LoRA (Low-Rank Adaptation). Rather than fine-tuning all parameters, LoRA freezes the pre-trained weights and injects trainable low-rank decomposition matrices (A and B) into each attention projection. The update ΔW = BA where B ∈ Rᵈˣʳ and A ∈ Rʳˣᵏ, with rank r ≪ d. Based on LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2022).

Architecture

LoRA adapters injected into Q, V projections
Rank r configurable; original weights frozen
Zero-initialised B, random Gaussian A — ensures ΔW=0 at start

Training

Hyperparameter	Value
Dataset	TinyShakespeare
Steps	1,000 (val every 100)
Hardware	A100 GPU

Results

Split	Loss
Train	3.51
Validation	3.50

Paper

LoRA: Low-Rank Adaptation of Large Language Models — Hu et al., 2022

Yuvraj Singh

Overview

Architecture

Training

Results

Paper