Context-Aware Self-Attention Networks
Authors: Baosong Yang, Jian Li, Derek F. Wong, Lidia S. Chao, Xing Wang, Zhaopeng Tu387-394
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on WMT14 English German and WMT17 Chinese English translation tasks demonstrate the effectiveness and universality of the proposed methods. |
| Researcher Affiliation | Collaboration | 1NLP2CT Lab, Department of Computer and Information Science, University of Macau nlp2ct.baosong@gmail.com, {derekfw,lidiasc}@umac.mo 2The Chinese University of Hong Kong jianli@cse.cuhk.edu.hk 3Tencent AI Lab {brightxwang,zptu}@tencent.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | For the En De task, we trained on the widely-used WMT14 dataset consisting of about 4.56 million sentence pairs. [...] For the Zh En task, the models were trained using all of the available parallel corpus from WMT17 dataset, consisting of about 20.62 million sentence pairs. |
| Dataset Splits | Yes | The models were validated on newstest2013 and examined on newstest2014. [...] We used newsdev2017 as the development set and newstest2017 as the test set. |
| Hardware Specification | Yes | All the models are trained on eight NVIDIA P40 GPUs, each of which is allocated a batch of 4096 tokens. |
| Software Dependencies | No | The paper mentions 'scripts provided in Moses' and 'byte-pair encoding (BPE)' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We followed (Vaswani et al. 2017) to set the configurations and reproduced their reported results on the En De task. We tested both the Base and Big models, which differ at the layer size (512 vs. 1024) and the number of attention heads (8 vs. 16). [...] each of which is allocated a batch of 4096 tokens. |