reproducibilityindex.ai

Unsupervised Hyper-alignment for Multilingual Word Embeddings

Authors: Jean Alaux, Edouard Grave, Marco Cuturi, Armand Joulin

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method by jointly aligning word vectors in eleven languages, showing consistent improvement with indirect mappings while maintaining competitive performance on direct word translation. 5 EXPERIMENTAL RESULTS
Researcher Affiliation	-1	Anonymous authors Paper under double-blind review
Pseudocode	No	The paper describes procedures using mathematical formulations and textual descriptions but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using a third-party package ('python optimal transport package. 1POT, https://pot.readthedocs.io/en/stable/') but does not state that its own source code is available.
Open Datasets	Yes	We use normalized fast Text word vectors trained on the Wikipedia Corpus (Bojanowski et al., 2016). We evaluate on the MUSE test datasets (Conneau et al., 2017)
Dataset Splits	No	The paper mentions using 'MUSE test datasets' and building a 'test set' for missing pairs, but does not provide specific details on train/validation splits for model training.
Hardware Specification	No	UMH runs on a CPU with 10 threads in less than 10 minutes for a pair of languages and in 2 hours for 6 languages. Note that our approach is computationally more efﬁcient, training in 2h on a CPU instead of 5h on a GPU.
Software Dependencies	No	We use the python optimal transport package. 1POT, https://pot.readthedocs.io/en/stable/ (No specific version number mentioned for POT or any other software component).
Experiment Setup	Yes	We run the ﬁrst epoch with a batch size of 500 and then set it to 1k. We set the learning rate to 0.1 for the ℓ2 loss and to 25 for the RCSLS loss in the multilingual setting, and to 50 in the bilingual setting. For the ﬁrst two iterations, we learn the assignment with a regularized Sinkhorn. Then, for efﬁciency, we switch to a greedy assignment, by picking the max per row of the score matrix. We initialize with the Gromov-Wasserstein approach applied to the ﬁrst 2k vectors and a regularization parameter ε of 0.5 (Peyr e et al., 2016). We restrict each set of vectors to its ﬁrst 20k elements.