reproducibilityindex.ai

Non-Autoregressive Neural Machine Translation

Authors: Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K. Li, Richard Socher

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 EXPERIMENTS: We evaluate the proposed NAT on three widely used public machine translation corpora... Table 1: BLEU scores on ofﬁcial test sets... 5.3 ABLATION STUDY
Researcher Affiliation	Collaboration	Salesforce Research {james.bradbury,cxiong,rsocher}@salesforce.com The University of Hong Kong {jiataogu, vli}@eee.hku.hk
Pseudocode	No	The paper provides architectural diagrams and mathematical formulations but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Implementation We have open-sourced our Py Torch implementation of the NAT6. 6https://github.com/salesforce/nonauto-nmt
Open Datasets	Yes	Dataset We evaluate the proposed NAT on three widely used public machine translation corpora: IWSLT16 En De2, WMT14 En De,3, and WMT16 En Ro4. 2https://wit3.fbk.eu/ 3http://www.statmt.org/wmt14/translation-task 4http://www.statmt.org/wmt16/translation-task
Dataset Splits	Yes	We use IWSLT which is smaller than the other two datasets as the development dataset for ablation experiments, and additionally train and test our primary models on both directions of both WMT datasets. Table 1: BLEU scores on ofﬁcial test sets (newstest2014 for WMT En-De and newstest2016 for WMT En-Ro) or the development set for IWSLT.
Hardware Specification	Yes	Latency is computed as the time to decode a single sentence without minibatching, averaged over the whole test set; decoding is implemented in Py Torch on a single NVIDIA Tesla P100.
Software Dependencies	No	The paper mentions using 'Py Torch' for implementation but does not specify its version number or other software dependencies with their versions.
Experiment Setup	Yes	Hyperparameters For experiments on WMT datasets, we use the hyperparameter settings of the base Transformer model described in Vaswani et al. (2017), though without label smoothing. As IWSLT is a smaller corpus, and to reduce training time, we use a set of smaller hyperparameters (dmodel = 287, dhidden = 507, nlayer = 5, nhead = 2, and twarmup = 746) for all experiments on that dataset. For ﬁne-tuning we use λ = 0.25.