reproducibilityindex.ai

Dual Learning for Machine Translation

Authors: Di He, Yingce Xia, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, Wei-Ying Ma

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that dual-NMT works very well on English French translation; especially, by learning from monolingual data (with 10% bilingual data for warm start), it achieves a comparable accuracy to NMT trained from the full bilingual data for the French-to-English translation task.
Researcher Affiliation	Collaboration	1Key Laboratory of Machine Perception (MOE), School of EECS, Peking University 2University of Science and Technology of China 3Microsoft Research
Pseudocode	Yes	Algorithm 1 The dual-learning algorithm
Open Source Code	No	The paper states 'We leverage a tutorial NMT system implemented by Theano for all the experiments. 2dl4mt-tutorial: https://github.com/nyu-dl'. This link is to a tutorial system used, not to the specific open-source implementation of the dual-NMT methodology described in this paper by the authors.
Open Datasets	Yes	In detail, we used the same bilingual corpora from WMT 14 as used in [1, 5], which contains 12M sentence pairs extracting from ﬁve datasets: Europarl v7, Common Crawl corpus, UN corpus, News Commentary, and 109French-English corpus. ... We used the News Crawl: articles from 2012 provided by WMT 14 as monolingual data.
Dataset Splits	Yes	Following common practices, we concatenated newstest2012 and newstest2013 as the validation set, and used newstest2014 as the testing set.
Hardware Specification	Yes	Each of the baseline models was trained with Ada Delta [15] on K40m GPU until their performances stopped to improve on the validation set.
Software Dependencies	No	The paper mentions 'Theano' and 'Ada Delta' but does not provide specific version numbers for these or any other software components.
Experiment Setup	Yes	We used the GRU networks and followed the practice in [1] to set experimental parameters. ... Each word was projected into a continuous vector space of 620 dimensions, and the dimension of the recurrent unit was 1000. We removed sentences with more than 50 words from the training set. Batch size was set as 80 with 20 batches pre-fetched and sorted by sentence lengths. ... trained with Ada Delta [15]... We set the beam search size to be 2 in the middle translation process. ... during testing we used beam search [12] with beam size of 12 for all the algorithms as in many previous works.