reproducibilityindex.ai

Area Attention

Authors: Yang Li, Lukasz Kaiser, Samy Bengio, Si Si

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate area attention on two tasks: neural machine translation (both character and token-level) and image captioning, and improve upon strong (state-of-the-art) baselines in all the cases.
Researcher Affiliation	Industry	1Google Research, Mountain View, CA, USA. Correspondence to: Yang Li <liyang@google.com>.
Pseudocode	Yes	We present the Pseudo code for performing Eq. 3, 4 and 5 as well as the shape size of each area in Algorithm 1 and 2.
Open Source Code	Yes	See Tensor Flow implementation of Area Attention as well as its integration with Transformer and LSTM in https://github.com/tensorﬂow/tensor2tensor.
Open Datasets	Yes	We use the same dataset as the one used in (Vaswani et al., 2017) in which the WMT 2014 English-German (EN-DE) dataset contains about 4.5 million English-German sentence pairs, and the English-French (EN-FR) dataset has about 36 million English-French sentence pairs (Wu et al., 2016).
Dataset Splits	Yes	we trained each model based on the training & development sets provided by the COCO dataset (Lin et al., 2014), which as 82K images for training and 40K for validation.
Hardware Specification	Yes	trained on one machine with 8 NVIDIA P100 GPUs for a total of 250,000 steps.
Software Dependencies	No	The paper mentions "Tensor Flow implementation" but does not specify a version number for TensorFlow or any other software dependencies needed to replicate the experiment.
Experiment Setup	Yes	Tiny (#hidden layers=2, hidden size=128, ﬁlter size=512, #attention heads=4), Small (#hidden layers=2, hidden size=256, ﬁlter size=1024, #attention heads=4), Base (#hidden layers=6, hidden size=512, ﬁlter size=2048, #attention heads=8) and Big (#hidden layers=6, hidden size=1024, ﬁlter size=4096 for EN-DE and 8192 for EN-FR, #attention heads=16).