reproducibilityindex.ai

Monotonic Multihead Attention

Authors: Xutai Ma, Juan Miguel Pino, James Cross, Liezl Puzon, Jiatao Gu

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply MMA to the simultaneous machine translation task and demonstrate better latency-quality tradeoffs compared to MILk, the previous state-of-the-art approach. We analyze how the latency controls affect the attention span and we study the relationship between the speed of a head and the layer it belongs to.
Researcher Affiliation	Collaboration	1Facebook 2Johns Hopkins University
Pseudocode	Yes	Algorithm 1 MMA monotonic decoding.
Open Source Code	Yes	The code is available at https://github.com/pytorch/fairseq/tree/master/ examples/simultaneous_translation
Open Datasets	Yes	Dataset Train Validation Test IWSLT15 En-Vi 133k 1268 1553 WMT15 De-En 4.5M 3000 2169
Dataset Splits	Yes	Dataset Train Validation Test IWSLT15 En-Vi 133k 1268 1553 WMT15 De-En 4.5M 3000 2169
Hardware Specification	No	The paper does not explicitly describe the hardware specifications (e.g., GPU model, CPU, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Fairseq library (Ott et al., 2019)' but does not provide specific version numbers for Fairseq or any other software dependencies.
Experiment Setup	Yes	Detailed hyperparameter settings can be found in subsection A.1. Hyperparameter WMT15 German-English IWSLT English-Vietnamese encoder embed dim 1024 512 encoder ffn embed dim 4096 1024 encoder attention heads 16 4 encoder layers 6 decoder embed dim 1024 512 decoder ffn embed dim 4096 1024 decoder attention heads 16 4 decoder layers 6 dropout 0.3 optimizer adam adam-β (0.9, 0.98) clip-norm 0.0 lr 0.0005 lr scheduler inverse sqrt warmup-updates 4000 warmup-init-lr 1e-07 label-smoothing 0.1 max tokens 3584 8 8 2 16000