Unsupervised Neural Machine Translation

Authors: Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our model builds upon the recent work on unsupervised embedding mappings, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingual corpora alone using a combination of denoising and backtranslation. Despite the simplicity of the approach, our system obtains 15.56 and 10.21 BLEU points in WMT 2014 French English and German English translation.
Researcher Affiliation Academia Mikel Artetxe, Gorka Labaka & Eneko Agirre IXA NLP Group University of the Basque Country (UPV/EHU) {mikel.artetxe,gorka.labaka,e.agirre}@ehu.eus Kyunghyun Cho New York University CIFAR Azrieli Global Scholar kyunghyun.cho@nyu.edu
Pseudocode No The paper describes the proposed method and training procedure in narrative text and with a system architecture diagram (Figure 1), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our implementation is released as an open source project1. 1https://github.com/artetxem/undreamt
Open Datasets Yes we used the News Crawl corpus with articles from 2007 to 2013. [...] WMT 2014 shared task2. 2http://www.statmt.org/wmt14/translation-task.html
Dataset Splits No The paper mentions using WMT 2014 data for training and evaluating on newstest2014, but it does not specify a training/validation/test split (e.g., percentages or sample counts) for the primary datasets used in the reported experiments. It only states that hyperparameters were decided using Spanish-English WMT data in preliminary experiments, without providing split details for that either.
Hardware Specification Yes Using our PyTorch implementation, training each system took about 4-5 days on a single Titan X GPU for the full unsupervised variant.
Software Dependencies No The paper mentions using a 'PyTorch implementation', 'standard Moses tools', and an 'implementation provided by the authors' for BPE, but it does not specify version numbers for PyTorch or other key software libraries and tools required for replication.
Experiment Setup Yes The training of the proposed system itself is done using the procedure described in Section 3.2 with the cross-entropy loss function and a batch size of 50 sentences. [...] We use Adam as our optimizer with a learning rate of α = 0.0002 (Kingma & Ba, 2015). During training, we use dropout regularization with a drop probability p = 0.3. Given that we restrict ourselves not to use any parallel data for development purposes, we perform a fixed number of iterations (300,000) to train each variant.