BERT

Language Models PyTorch Cornell Movie Dialogs
GitHub →

Overview

From-scratch PyTorch replication of BERT. Unlike decoder-only models, BERT conditions on both left and right context via masked language modelling (MLM), making it a strong encoder backbone for classification and retrieval. Based on BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2019).

Architecture

Full bidirectional transformer encoder — no causal mask, tokens attend to the full sequence. Trained with the MLM objective where 15% of tokens are masked and the model reconstructs them from surrounding context.

Training

  • Dataset: Cornell Movie Dialog Corpus
  • Objective: Masked Language Modelling (MLM)
  • Framework: PyTorch

Paper

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding — Devlin et al., 2019