Unsupervised Hyper-alignment for Multilingual Word Embeddings
Authors: Jean Alaux, Edouard Grave, Marco Cuturi, Armand Joulin
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method by jointly aligning word vectors in eleven languages, showing consistent improvement with indirect mappings while maintaining competitive performance on direct word translation. 5 EXPERIMENTAL RESULTS |
| Researcher Affiliation | -1 | Anonymous authors Paper under double-blind review |
| Pseudocode | No | The paper describes procedures using mathematical formulations and textual descriptions but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using a third-party package ('python optimal transport package. 1POT, https://pot.readthedocs.io/en/stable/') but does not state that its own source code is available. |
| Open Datasets | Yes | We use normalized fast Text word vectors trained on the Wikipedia Corpus (Bojanowski et al., 2016). We evaluate on the MUSE test datasets (Conneau et al., 2017) |
| Dataset Splits | No | The paper mentions using 'MUSE test datasets' and building a 'test set' for missing pairs, but does not provide specific details on train/validation splits for model training. |
| Hardware Specification | No | UMH runs on a CPU with 10 threads in less than 10 minutes for a pair of languages and in 2 hours for 6 languages. Note that our approach is computationally more efficient, training in 2h on a CPU instead of 5h on a GPU. |
| Software Dependencies | No | We use the python optimal transport package. 1POT, https://pot.readthedocs.io/en/stable/ (No specific version number mentioned for POT or any other software component). |
| Experiment Setup | Yes | We run the first epoch with a batch size of 500 and then set it to 1k. We set the learning rate to 0.1 for the ℓ2 loss and to 25 for the RCSLS loss in the multilingual setting, and to 50 in the bilingual setting. For the first two iterations, we learn the assignment with a regularized Sinkhorn. Then, for efficiency, we switch to a greedy assignment, by picking the max per row of the score matrix. We initialize with the Gromov-Wasserstein approach applied to the first 2k vectors and a regularization parameter ε of 0.5 (Peyr e et al., 2016). We restrict each set of vectors to its first 20k elements. |