Word translation without parallel data
Authors: Guillaume Lample, Alexis Conneau, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that our method works very well also for distant language pairs... In this section, we empirically demonstrate the effectiveness of our unsupervised approach on several benchmarks, and compare it with state-of-the-art supervised methods. |
| Researcher Affiliation | Collaboration | {glample,aconneau,ranzato,rvj}@fb.com ludovic.denoyer@upmc.fr Equal contribution. Order has been determined with a coin flip. Facebook AI Research Sorbonne Universit es, UPMC Univ Paris 06, UMR 7606, LIP6 LIUM, University of Le Mans |
| Pseudocode | No | The paper describes the algorithmic steps and equations within the main text, but it does not include any clearly labeled pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Our code, embeddings and dictionaries are publicly available1. 1https://github.com/facebookresearch/MUSE |
| Open Datasets | Yes | We use unsupervised word vectors that were trained using fast Text2. These correspond to monolingual embeddings of dimension 300 trained on Wikipedia corpora... We make these dictionaries publicly available as part of the MUSE library3... We use the Sem Eval 2017 competition data (Camacho-Collados et al. (2017))... Europarl corpus. |
| Dataset Splits | Yes | We divide the learning rate by 2 every time our unsupervised validation criterion decreases. We use it as a stopping criterion during training, and also for hyperparameter selection in all our experiments. Specifically, we consider the 10k most frequent source words, and use CSLS to generate a translation for each of them. We then compute the average cosine similarity between these deemed translations, and use this average as a validation metric. |
| Hardware Specification | No | The paper does not specify the exact hardware components (e.g., GPU models, CPU types, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'fast Text' as the tool used to train word vectors, and references 'deep adversarial networks of Goodfellow et al. (2014)', but it does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For our discriminator, we use a multilayer perceptron with two hidden layers of size 2048, and Leaky-Re LU activation functions. The input to the discriminator is corrupted with dropout noise with a rate of 0.1. We use stochastic gradient descent with a batch size of 32, a learning rate of 0.1 and a decay of 0.95 both for the discriminator and W. We divide the learning rate by 2 every time our unsupervised validation criterion decreases. |