reproducibilityindex.ai

Neural Machine Translation with Key-Value Memory-Augmented Attention

Authors: Fandong Meng, Zhaopeng Tu, Yong Cheng, Haiyang Wu, Junjie Zhai, Yuekui Yang, Di Wang

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on Chinese English and WMT17 German English translation tasks demonstrate the superiority of the proposed model.
Researcher Affiliation	Industry	Fandong Meng, Zhaopeng Tu, Yong Cheng, Haiyang Wu, Junjie Zhai, Yuekui Yang, Di Wang Tencent AI Lab {fandongmeng,zptu,yongcheng,gavinwu,jasonzhai,yuekuiyang,diwang}@tencent.com
Pseudocode	No	The paper describes the model components and operations using text and equations, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	For Zh En, the training data consist of 1.25M sentence pairs extracted from LDC corpora. For De En, we perform our experiments on the corpus provided by WMT17, which contains 5.6M sentence pairs.
Dataset Splits	Yes	We choose NIST 2002 (MT02) dataset as our valid set, and NIST 2003-2006 (MT03-06) datasets as our test sets. For De En, we perform our experiments on the corpus provided by WMT17, which contains 5.6M sentence pairs. We use newstest2016 as the development set, and newstest2017 as the testset.
Hardware Specification	Yes	When running on a single GPU device Tesla P40, the speed of the RNNSEARCH model is 2773 target words per second, while the speed of the proposed models is 1676 2263 target words per second.
Software Dependencies	No	The paper mentions optimizers like SGD and Ada Delta, and uses GRU/RNNs, but does not provide specific version numbers for any software libraries or frameworks used.
Experiment Setup	Yes	The parameters are updated by SGD and mini-batch (size 80) with learning rate controlled by Ada Delta [Zeiler, 2012] (ϵ = 1e 6 and ρ = 0.95). ... The dimension of word embedding and hidden layer is 512, and the beam size in testing is 10. We apply dropout on the output layer to avoid over-fitting [Hinton et al., 2012], with dropout rate being 0.5. Hyper parameter λ in Eq. 19 is set to 1.0.