Massively Multilingual Sparse Word Representations

Authors: Gábor Berend

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our proposed algorithm behaves competitively to strong baselines through a series of rigorous experiments performed towards downstream applications spanning over dependency parsing, document classification and natural language inference.
Researcher Affiliation Academia G abor Berend 1,2 1 University of Szeged, Institute of Informatics Szeged, Hungary 2 MTA-SZTE RGAI, Szeged Szeged, Hungary berendg@inf.u-szeged
Pseudocode Yes Algorithm 1 Pseudocode of MAMUS
Open Source Code Yes We make our sparse embeddings for 27 languages and the source code that we used to obtain them publicly available at https://github.com/begab/mamus.
Open Datasets Yes Our primary source for evaluating our proposed representations is the massively multilingual evaluation framework from (Ammar et al., 2016b), which also includes recommended corpora to be used for training word representations for more than 70 languages. All the embeddings used in our experiments were trained over these recommended resources, which is a combination of the Leipzig Corpora Collection (Goldhahn et al., 2012) and Europarl (Koehn, 2005).
Dataset Splits Yes During the monolingual experiments, we were solely focusing on the development set for English to set the hyperparameter controlling the sparsity of the representations.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or specific computing environments with specs) used for running the experiments. It only mentions training times.
Software Dependencies Yes We implemented a simple multilayer perceptron in Py Torch v1.1 (Paszke et al., 2017) with two hidden layers employing Re LU nonlinearity.
Experiment Setup Yes We simply used the default settings of fasttext for training, meaning that the original dense word representations were 100 dimensional. We set the number of semantic atoms in the dictionary matrix D consistently as k = 1200 throughout all our experiments. Based on our monolingual evaluation results from Table 1, we decided to fix the regularization coefficient for MAMUS at λ = 0.1 for all of our upcoming mutlilingual experiments. The MLP uses the categorical cross-entropy for loss function, which was optimized by Adam (Kingma & Ba, 2014).