Gemma

Language Models PyTorch TinyShakespeare
GitHub →

Overview

PyTorch replication of Google’s Gemma decoder-only architecture. Gemma builds on the Llama recipe with multi-query attention and GeGLU activations, offering better efficiency at comparable quality. Based on Gemma: Open Models Based on Gemini Research and Technology (Gemma Team, Google DeepMind, 2024).

Architecture

  • Multi-query attention (MQA)
  • GeGLU feed-forward sublayers
  • RMSNorm (pre-norm)
  • Rotary Positional Embeddings (RoPE)
  • Decoder-only autoregressive stack

Training

  • Dataset: TinyShakespeare
  • Objective: Causal language modelling
  • Framework: PyTorch

Paper

Gemma: Open Models Based on Gemini Research and Technology — Gemma Team, Google DeepMind 2024