Attention Mechanisms
Overview
Standalone implementations of the two seminal additive and multiplicative attention mechanisms that preceded the Transformer. These form the conceptual foundation for all modern attention: Bahdanau’s additive scoring and Luong’s dot-product / general scoring.
Implemented
Bahdanau Attention (Neural Machine Translation by Jointly Learning to Align and Translate, Bahdanau et al., 2015)
- Additive (MLP) scoring function: score(s, h) = vᵀ tanh(Ws·s + Wh·h)
- Enables encoder-decoder alignment without fixed-length bottleneck
Luong Attention (Effective Approaches to Attention-based Neural Machine Translation, Luong et al., 2015)
- Three scoring variants: dot, general, concat
- Global and local attention modes