Transformer

Language Models PyTorch Samanantar
GitHub →

Overview

Full encoder-decoder transformer for English-to-Hindi neural machine translation, dubbed SmolTransformer. Replicates Attention Is All You Need (Vaswani et al., 2017) and is published on HuggingFace.

Architecture

  • 6-layer encoder + 6-layer decoder
  • Multi-head self-attention + cross-attention
  • Sinusoidal positional embeddings
  • ~25M parameters, 512-token context window
  • IndicBARTSS tokenizer (~30K vocab)
  • Supports top-K sampling and beam search at inference

Training

  • Dataset: Samanantar (large-scale English–Hindi parallel corpus)
  • Techniques: Automatic mixed precision, gradient accumulation
  • Tracking: WandB (loss, perplexity, gradient norms)

Published Model

HuggingFace — YuvrajSingh9886/SmolTransformer

Paper

Attention Is All You Need — Vaswani et al., 2017