Regularizing Neural Machine Translation by Target-Bidirectional Agreement
Authors: Zhirui Zhang, Shuangzhi Wu, Shujie Liu, Mu Li, Ming Zhou, Tong Xu443-450
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our proposed method significantly outperforms state-of-the-art baselines on Chinese-English and English-German translation tasks. |
| Researcher Affiliation | Collaboration | University of Science and Technology of China, Hefei, China Harbin Institute of Technology, Harbin, China Microsoft Research Asia |
| Pseudocode | Yes | Algorithm 1 Training Algorithm for L2R Model |
| Open Source Code | No | The paper mentions external tools and scripts (e.g., Moses scripts, Tensorflow T2T, SacreBLEU) but does not provide a link to or statement about the availability of their own source code for the proposed methodology. |
| Open Datasets | Yes | For NIST Open MT s Chinese-English translation task, we select our training data from LDC corpora,1 which consists of 2.6M sentence pairs... and For WMT17 s English-German translation task, we use the pre-processed training data provided by the task organizers.2 with footnotes providing specific LDC IDs and URLs. |
| Dataset Splits | Yes | The NIST Open MT 2006 evaluation set is used as validation set, and NIST 2003, 2005, 2008, 2012 datasets as test sets. and We use the newstest2016 as the validation set and the newstest2017 as the test set. |
| Hardware Specification | Yes | All models are trained on 4 Tesla M40 GPUs |
| Software Dependencies | Yes | The Transformer model Vaswani et al. (2017) is adopted as our baseline. For all translation tasks, we follow the transformer base v2 hyper-parameter setting5 which corresponds to a 6-layer transformer with a model size of 512. (Footnote 5 links to tensorflow/tensor2tensor/blob/v1.3.0/tensor2tensor/models/transformer.py). Also mentions Moses multibleu.perl script and official tools Sacre BLEU. |
| Experiment Setup | Yes | For all translation tasks, we follow the transformer base v2 hyper-parameter setting... which corresponds to a 6-layer transformer with a model size of 512. The parameters are initialized using a normal distribution with a mean of 0 and a variance of 6/(drow + dcol)... All models are trained on 4 Tesla M40 GPUs for a total of 100K steps using the Adam... algorithm. The initial learning rate is set to 0.2 and decayed according to the schedule in Vaswani et al. (2017). During training, the batch size is set to approximately 4096 words per batch and checkpoints are created every 60 minutes. At test time, we use a beam of 8 and a length penalty of 1.0. Other hyper-parameters used in our approach are set as λ = 1, m = 1. |