reproducibilityindex.ai

Duplex Sequence-to-Sequence Learning for Reversible Machine Translation

Authors: Zaixiang Zheng, Hao Zhou, Shujian Huang, Jiajun Chen, Jingjing Xu, Lei Li

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on standard machine translation benchmarks to inspect REDER s performance on seq2seq tasks. Experimental results show that the duplex idea indeed works: Overall, REDER achieves BLEU scores of 27.50 and 31.25 on standard WMT14 EN-DE and DE-EN benchmarks, respectively.
Researcher Affiliation	Collaboration	Zaixiang Zheng1,2 , Hao Zhou2, Shujian Huang1, Jiajun Chen1, Jingjing Xu2, Lei Li3 1National Key Laboratory for Novel Software Technology, Nanjing University 2Byte Dance AI Lab 3UC Santa Barbara
Pseudocode	No	The paper describes the model architecture and computations using mathematical formulas and diagrams (e.g., Figure 2 and equations for reversible layers) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/zhengzx-nlp/REDER.
Open Datasets	Yes	We evaluate our proposal on two standard translation benchmarks, i.e., WMT14 English (EN) German (DE) (4.5M training pairs), and WMT16 English (EN) Romanian (RO) (610K training pairs).
Dataset Splits	No	The paper states 'We measure the validation BLEU scores for every 2,000 updates, and average the best 5 checkpoints to obtain the ﬁnal model.' indicating a validation set was used, but it does not provide specific details on the size or split percentages of this validation set from the main datasets.
Hardware Specification	Yes	All models are trained for 300K updates using Nvidia V100 GPUs with a batch size of approximately 64K tokens. We train REDER on WMT14 EN DE using 8 32GB V100 GPUs for 432 hours (54 hour per GPU) and obtained a bidirectional translation model.
Software Dependencies	No	The paper states 'All models are implemented on fairseq [Ott et al., 2019]' and mentions 'an efﬁcient library of C++ implementation' for CTC beam search, but it does not provide specific version numbers for these or other software dependencies like Python or PyTorch.
Experiment Setup	Yes	We design REDER based on the hyper-parameters of Transformerbase [Vaswani et al., 2017]. All models are implemented on fairseq [Ott et al., 2019]. REDER consists of 12 stacked layers. The number of head is 8, the model dimension is 512, and the inner dimension of FFN is 2048. For both AT and NAT models, we set the dropout rate 0.1 for WMT14 EN DE and WMT16 EN RO. We adopt weight decay with a decay rate 0.01 and label smoothing with ϵ = 0.1. By default, we upsample the source input by a factor of 2 for CTC-based models. We set λfba and λcc to 0.1 for all experiments. All models are trained for 300K updates using Nvidia V100 GPUs with a batch size of approximately 64K tokens.