Accessing Higher Dimensions for Unsupervised Word Translation

Authors: Sida Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results show that unsupervised translation can be achieved more easily and robustly than previously thought less than 80MB and minutes of CPU time is required to achieve over 50% accuracy for English to Finnish, Hungarian, and Chinese translations when trained in the same domain; even under domain mismatch, the method still works fully unsupervised on English News Crawl to Chinese Wikipedia and English Europarl to Spanish Wikipedia, among others.
Researcher Affiliation Industry Sida I. Wang FAIR, Meta
Pseudocode Yes Algorithm 1 coocmap self-learning; Algorithm 2 vecmap self-learning
Open Source Code Yes code released at https://github.com/facebookresearch/coocmap
Open Datasets Yes For training data we use Wikipedia (wiki), Europarl (parl), and News Crawl (news)... wiki (https://dumps.wikimedia.org/): Wikipedia downloaded directly from the official dumps (pages-meta-current), extract text using Wiki Extractor (Attardi, 2015)... parl (https://www.statmt.org/europarl/): Europarl (Koehn, 2005)... news (https://data.statmt.org/news-crawl/): News Crawl 2019.es
Dataset Splits No The paper describes using a 'full MUSE dictionary' for evaluation of results, and mentions training data sources, but it does not specify any explicit validation dataset splits or methodology.
Hardware Specification No The paper mentions 'minutes of CPU time' and discusses computational complexity in terms of FLOPS, but it does not specify any concrete hardware details such as CPU models, GPU models, or specific machine configurations used for running the experiments.
Software Dependencies No The paper mentions software like 'fasttext' (Bojanowski et al., 2017), 'Huggingface Word Level tokenizer', and 'jieba' (for Chinese segmentation) but does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes Each point in the scatter plots represents an experiment where a specific amount of data was taken from the head of the file for training co-occurrence matrices and fasttext vectors with default settings (skipgram, 300 dimension, more in B) for fasttext. coocmap use the same window size as fasttext (m = 5), the same CSLS (k = 10) and same optimization parameters as vecmap. In the main results, we used default parameters, where the important ones were skigram, lr: 0.05, dim: 300, epoch: 5. The learning rate was slowed as 0.1(d/50) 1/2 to account for observed instability in higher dimensions. The epoch was increased to 5 (300/|D|)1/2 for data size D in MB to run more epoch on smaller data size.