Transformer
Overview
Full encoder-decoder transformer for English-to-Hindi neural machine translation, dubbed SmolTransformer. Replicates Attention Is All You Need (Vaswani et al., 2017) and is published on HuggingFace.
Architecture
- 6-layer encoder + 6-layer decoder
- Multi-head self-attention + cross-attention
- Sinusoidal positional embeddings
- ~25M parameters, 512-token context window
- IndicBARTSS tokenizer (~30K vocab)
- Supports top-K sampling and beam search at inference
Training
- Dataset: Samanantar (large-scale English–Hindi parallel corpus)
- Techniques: Automatic mixed precision, gradient accumulation
- Tracking: WandB (loss, perplexity, gradient norms)
Published Model
HuggingFace — YuvrajSingh9886/SmolTransformer
Paper
Attention Is All You Need — Vaswani et al., 2017