Multilingual Distributed Representations without Word Alignment
Authors: Unknown
ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present results from two experiments. The BICVM model was trained on 500k sentence pairs of the English-German parallel section of the Europarl corpus. We evaluate our model using the cross-lingual document classification (CLDC) task of Klementiev et al. [16]. |
| Researcher Affiliation | Academia | Karl Moritz Hermann and Phil Blunsom Department of Computer Science University of Oxford Oxford, OX1 3QD, UK {karl.moritz.hermann,phil.blunsom}@cs.ox.ac.uk |
| Pseudocode | No | The paper describes the model using equations and textual explanations, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Results for other dimensionalities and our source code for our model are available at http://www.karlmoritz.com. |
| Open Datasets | Yes | We use the Europarl corpus (v7)1 for training the bilingual model. The corpus was pre-processed using the set of tools provided by cdec2 [9] for tokenizing and lowercasing the data. 1http://www.statmt.org/europarl/ |
| Dataset Splits | Yes | We ran the CLDC experiments both by training on English and testing on German documents and vice versa. Using the data splits provided by [16], we used varying training data sizes from 100 to 10,000 documents for training the multiclass classifier. |
| Hardware Specification | No | The paper does not specify any hardware details such as CPU, GPU models, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'cdec' for pre-processing and refers to an 'averaged perceptron classifier' implementation from prior work, but it does not specify any other software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other library versions). |
| Experiment Setup | Yes | L2 regularization (1), step-size (0.1), number of noise elements (50), margin size (50), embedding dimensionality (d=40). We use the adaptive gradient method, Ada Grad [8], for updating the weights of our models, and terminate training after 50 iterations. |