reproducibilityindex.ai

Context-Aware Self-Attention Networks

Authors: Baosong Yang, Jian Li, Derek F. Wong, Lidia S. Chao, Xing Wang, Zhaopeng Tu387-394

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on WMT14 English German and WMT17 Chinese English translation tasks demonstrate the effectiveness and universality of the proposed methods.
Researcher Affiliation	Collaboration	1NLP2CT Lab, Department of Computer and Information Science, University of Macau nlp2ct.baosong@gmail.com, {derekfw,lidiasc}@umac.mo 2The Chinese University of Hong Kong jianli@cse.cuhk.edu.hk 3Tencent AI Lab {brightxwang,zptu}@tencent.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	For the En De task, we trained on the widely-used WMT14 dataset consisting of about 4.56 million sentence pairs. [...] For the Zh En task, the models were trained using all of the available parallel corpus from WMT17 dataset, consisting of about 20.62 million sentence pairs.
Dataset Splits	Yes	The models were validated on newstest2013 and examined on newstest2014. [...] We used newsdev2017 as the development set and newstest2017 as the test set.
Hardware Specification	Yes	All the models are trained on eight NVIDIA P40 GPUs, each of which is allocated a batch of 4096 tokens.
Software Dependencies	No	The paper mentions 'scripts provided in Moses' and 'byte-pair encoding (BPE)' but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We followed (Vaswani et al. 2017) to set the configurations and reproduced their reported results on the En De task. We tested both the Base and Big models, which differ at the layer size (512 vs. 1024) and the number of attention heads (8 vs. 16). [...] each of which is allocated a batch of 4096 tokens.