Minimally-Constrained Multilingual Embeddings via Artificial Code-Switching
Authors: Michael Wick, Pallika Kanani, Adam Pocock
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The purpose of our experiments is to assess the quality and utility of the multilingual embedding spaces. The first set of experiments measures the former, and the second set measure the latter on the task of sentiment analysis. We select five languages to represent various levels of resource-availability, as reflected by the number of Wikipedia pages. |
| Researcher Affiliation | Industry | Michael Wick Oracle Labs michael.wick@oracle.com Pallika Kanani Oracle Labs pallika.kanani@oracle.com Adam Pocock Oracle Labs adam.pocock@oracle.com |
| Pseudocode | No | The paper describes the methods textually and with diagrams (Figure 2), but does not include any explicit pseudocode blocks or algorithms labeled as such. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We supplement our own datasets with additional Spanish (Hu and Liu 2004) and English data (Nakov et al. 2013). |
| Dataset Splits | Yes | Table 1 lists '#Train' and '#Test' column for each language, providing the absolute number of documents used for training and testing sentiment data, e.g., 'English (en) ... 24960 6393'. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as GPU or CPU models, memory, or specific computing environments used for the experiments. |
| Software Dependencies | No | The paper mentions using 'FACTORIE for training (Mc Callum, Schultz, and Singh 2009)' but does not provide a specific version number for FACTORIE or any other software dependency. |
| Experiment Setup | Yes | In all experiments, we use the same CBOW parameters (2 iterations, 300 dimensions, learning rate 0.05, filter words occurring fewer than 10 times). |