Context-Aware Self-Attention Networks

Authors: Baosong Yang, Jian Li, Derek F. Wong, Lidia S. Chao, Xing Wang, Zhaopeng Tu387-394

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on WMT14 English German and WMT17 Chinese English translation tasks demonstrate the effectiveness and universality of the proposed methods.
Researcher Affiliation Collaboration 1NLP2CT Lab, Department of Computer and Information Science, University of Macau nlp2ct.baosong@gmail.com, {derekfw,lidiasc}@umac.mo 2The Chinese University of Hong Kong jianli@cse.cuhk.edu.hk 3Tencent AI Lab {brightxwang,zptu}@tencent.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes For the En De task, we trained on the widely-used WMT14 dataset consisting of about 4.56 million sentence pairs. [...] For the Zh En task, the models were trained using all of the available parallel corpus from WMT17 dataset, consisting of about 20.62 million sentence pairs.
Dataset Splits Yes The models were validated on newstest2013 and examined on newstest2014. [...] We used newsdev2017 as the development set and newstest2017 as the test set.
Hardware Specification Yes All the models are trained on eight NVIDIA P40 GPUs, each of which is allocated a batch of 4096 tokens.
Software Dependencies No The paper mentions 'scripts provided in Moses' and 'byte-pair encoding (BPE)' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We followed (Vaswani et al. 2017) to set the configurations and reproduced their reported results on the En De task. We tested both the Base and Big models, which differ at the layer size (512 vs. 1024) and the number of attention heads (8 vs. 16). [...] each of which is allocated a batch of 4096 tokens.