Area Attention
Authors: Yang Li, Lukasz Kaiser, Samy Bengio, Si Si
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate area attention on two tasks: neural machine translation (both character and token-level) and image captioning, and improve upon strong (state-of-the-art) baselines in all the cases. |
| Researcher Affiliation | Industry | 1Google Research, Mountain View, CA, USA. Correspondence to: Yang Li <liyang@google.com>. |
| Pseudocode | Yes | We present the Pseudo code for performing Eq. 3, 4 and 5 as well as the shape size of each area in Algorithm 1 and 2. |
| Open Source Code | Yes | See Tensor Flow implementation of Area Attention as well as its integration with Transformer and LSTM in https://github.com/tensorflow/tensor2tensor. |
| Open Datasets | Yes | We use the same dataset as the one used in (Vaswani et al., 2017) in which the WMT 2014 English-German (EN-DE) dataset contains about 4.5 million English-German sentence pairs, and the English-French (EN-FR) dataset has about 36 million English-French sentence pairs (Wu et al., 2016). |
| Dataset Splits | Yes | we trained each model based on the training & development sets provided by the COCO dataset (Lin et al., 2014), which as 82K images for training and 40K for validation. |
| Hardware Specification | Yes | trained on one machine with 8 NVIDIA P100 GPUs for a total of 250,000 steps. |
| Software Dependencies | No | The paper mentions "Tensor Flow implementation" but does not specify a version number for TensorFlow or any other software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | Tiny (#hidden layers=2, hidden size=128, filter size=512, #attention heads=4), Small (#hidden layers=2, hidden size=256, filter size=1024, #attention heads=4), Base (#hidden layers=6, hidden size=512, filter size=2048, #attention heads=8) and Big (#hidden layers=6, hidden size=1024, filter size=4096 for EN-DE and 8192 for EN-FR, #attention heads=16). |