Gemma
Overview
PyTorch replication of Google’s Gemma decoder-only architecture. Gemma builds on the Llama recipe with multi-query attention and GeGLU activations, offering better efficiency at comparable quality. Based on Gemma: Open Models Based on Gemini Research and Technology (Gemma Team, Google DeepMind, 2024).
Architecture
- Multi-query attention (MQA)
- GeGLU feed-forward sublayers
- RMSNorm (pre-norm)
- Rotary Positional Embeddings (RoPE)
- Decoder-only autoregressive stack
Training
- Dataset: TinyShakespeare
- Objective: Causal language modelling
- Framework: PyTorch
Paper
Gemma: Open Models Based on Gemini Research and Technology — Gemma Team, Google DeepMind 2024