reproducibilityindex.ai

Unsupervised Machine Translation Using Monolingual Corpora Only

Authors: Guillaume Lample, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our model on two widely used datasets and two language pairs, reporting BLEU scores of 32.8 and 15.1 on the Multi30k and WMT English-French datasets, without using even a single parallel sentence at training time.
Researcher Affiliation	Collaboration	Guillaume Lample , Alexis Conneau , Ludovic Denoyer , Marc Aurelio Ranzato Facebook AI Research, Sorbonne Universit es, UPMC Univ Paris 06, LIP6 UMR 7606, CNRS {gl,aconneau,ranzato}@fb.com,ludovic.denoyer@lip6.fr
Pseudocode	Yes	Algorithm 1 Unsupervised Training for Machine Translation
Open Source Code	No	We will release the code to the public once the revision process is over.
Open Datasets	Yes	WMT 14 English-French... WMT 16 English-German... Multi30k-Task1 The task 1 of the Multi30k dataset (Elliott et al., 2016)...
Dataset Splits	Yes	The validation set is comprised of 3,000 English and French sentences extracted from our monolingual training corpora described above... For both pairs of languages and similarly to the WMT datasets above, we split the training and validation sets into monolingual corpora, resulting in 14,500 monolingual source and target sentences in the training set, and 500 sentences in the validation set.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or processor types used for running its experiments.
Software Dependencies	No	The paper mentions using "fastText" and optimizers like "Adam" and "RMSProp", but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	The embedding and LSTM hidden state dimensions are all set to 300... we found pwd = 0.1 and k = 3 to be good parameters... trained using Adam (Kingma & Ba, 2014), with a learning rate of 0.0003, β1 = 0.5, and a mini-batch size of 32. The discriminator is trained using RMSProp (Tieleman & Hinton, 2012) with a learning rate of 0.0005. We evenly alternate between one encoder-decoder and one discriminator update. We set λauto = λcd = λadv = 1.