BilBOWA: Fast Bilingual Distributed Representations without Word Alignments
Authors: Stephan Gouws, Yoshua Bengio, Greg Corrado
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that bilingual embeddings learned using the proposed model outperform state-of-the-art methods on a cross-lingual document classification task as well as a lexical translation task on WMT11 data.we experimentally evaluate the induced cross-lingual embeddings on a document-classification ( 5.1) and lexical translation task ( 5.2), where the method outperforms current state-of-the-art methods, with training time reduced to minutes or hours compared to several days for prior approaches; |
| Researcher Affiliation | Collaboration | Stephan Gouws SGOUWS@GOOGLE.COM Google Inc., Mountain View, CA, USA Yoshua Bengio Dept. IRO, Universit e de Montr eal, QC, Canada & Canadian Institute for Advanced Research Greg Corrado Google Inc., Mountain View, CA, USA |
| Pseudocode | No | The paper describes the model and training process textually and with equations, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | finally, we make available our efficient Cimplementation1 to hopefully stimulate further research on cross-lingual distributed feature learning. 1 https://github.com/gouwsmeister/bilbowa |
| Open Datasets | Yes | For monolingual training data, we use the freely available, pretokenized Wikipedia datasets (Al-Rfou et al., 2013). For cross-lingual training we use the freely-available Europarl v7 corpus (Koehn, 2005). |
| Dataset Splits | Yes | For the classification experiments, 15,000 documents (for each language) were randomly selected from the RCV1/2 corpus, with one third (5,000) used as the test set and the remainder divided into training sets of sizes between 100 and 10,000, and a separate, held-out validation set of 1,000 documents used during the development of our models. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running the experiments (e.g., CPU/GPU models, memory). |
| Software Dependencies | No | The paper states: "We implemented our model in C by building on the popular open-source word2vec toolkit3.", but it does not specify version numbers for C or the word2vec toolkit. |
| Experiment Setup | Yes | Embedding matrices were initialized by drawing from a zero mean, unit-variance gaussian distribution. The learning rate was set to 0.1, with linear decay. clipping individual updates to [ 0.1, 0.1] per thread. we set k = 15 which has been shown to give good results. |