Unsupervised Machine Translation Using Monolingual Corpora Only

Authors: Guillaume Lample, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our model on two widely used datasets and two language pairs, reporting BLEU scores of 32.8 and 15.1 on the Multi30k and WMT English-French datasets, without using even a single parallel sentence at training time.
Researcher Affiliation Collaboration Guillaume Lample , Alexis Conneau , Ludovic Denoyer , Marc Aurelio Ranzato Facebook AI Research, Sorbonne Universit es, UPMC Univ Paris 06, LIP6 UMR 7606, CNRS {gl,aconneau,ranzato}@fb.com,ludovic.denoyer@lip6.fr
Pseudocode Yes Algorithm 1 Unsupervised Training for Machine Translation
Open Source Code No We will release the code to the public once the revision process is over.
Open Datasets Yes WMT 14 English-French... WMT 16 English-German... Multi30k-Task1 The task 1 of the Multi30k dataset (Elliott et al., 2016)...
Dataset Splits Yes The validation set is comprised of 3,000 English and French sentences extracted from our monolingual training corpora described above... For both pairs of languages and similarly to the WMT datasets above, we split the training and validation sets into monolingual corpora, resulting in 14,500 monolingual source and target sentences in the training set, and 500 sentences in the validation set.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or processor types used for running its experiments.
Software Dependencies No The paper mentions using "fastText" and optimizers like "Adam" and "RMSProp", but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The embedding and LSTM hidden state dimensions are all set to 300... we found pwd = 0.1 and k = 3 to be good parameters... trained using Adam (Kingma & Ba, 2014), with a learning rate of 0.0003, β1 = 0.5, and a mini-batch size of 32. The discriminator is trained using RMSProp (Tieleman & Hinton, 2012) with a learning rate of 0.0005. We evenly alternate between one encoder-decoder and one discriminator update. We set λauto = λcd = λadv = 1.