Attention Mechanisms

Attention PyTorch Custom
GitHub →

Overview

Standalone implementations of the two seminal additive and multiplicative attention mechanisms that preceded the Transformer. These form the conceptual foundation for all modern attention: Bahdanau’s additive scoring and Luong’s dot-product / general scoring.

Implemented

Bahdanau Attention (Neural Machine Translation by Jointly Learning to Align and Translate, Bahdanau et al., 2015)

  • Additive (MLP) scoring function: score(s, h) = vᵀ tanh(Ws·s + Wh·h)
  • Enables encoder-decoder alignment without fixed-length bottleneck

Luong Attention (Effective Approaches to Attention-based Neural Machine Translation, Luong et al., 2015)

  • Three scoring variants: dot, general, concat
  • Global and local attention modes

Papers