Minimally-Constrained Multilingual Embeddings via Artificial Code-Switching

Authors: Michael Wick, Pallika Kanani, Adam Pocock

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The purpose of our experiments is to assess the quality and utility of the multilingual embedding spaces. The first set of experiments measures the former, and the second set measure the latter on the task of sentiment analysis. We select five languages to represent various levels of resource-availability, as reflected by the number of Wikipedia pages.
Researcher Affiliation Industry Michael Wick Oracle Labs michael.wick@oracle.com Pallika Kanani Oracle Labs pallika.kanani@oracle.com Adam Pocock Oracle Labs adam.pocock@oracle.com
Pseudocode No The paper describes the methods textually and with diagrams (Figure 2), but does not include any explicit pseudocode blocks or algorithms labeled as such.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We supplement our own datasets with additional Spanish (Hu and Liu 2004) and English data (Nakov et al. 2013).
Dataset Splits Yes Table 1 lists '#Train' and '#Test' column for each language, providing the absolute number of documents used for training and testing sentiment data, e.g., 'English (en) ... 24960 6393'.
Hardware Specification No The paper does not specify any particular hardware components such as GPU or CPU models, memory, or specific computing environments used for the experiments.
Software Dependencies No The paper mentions using 'FACTORIE for training (Mc Callum, Schultz, and Singh 2009)' but does not provide a specific version number for FACTORIE or any other software dependency.
Experiment Setup Yes In all experiments, we use the same CBOW parameters (2 iterations, 300 dimensions, learning rate 0.05, filter words occurring fewer than 10 times).