Multilingual Distributed Representations without Word Alignment

Authors: Unknown

ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present results from two experiments. The BICVM model was trained on 500k sentence pairs of the English-German parallel section of the Europarl corpus. We evaluate our model using the cross-lingual document classification (CLDC) task of Klementiev et al. [16].
Researcher Affiliation Academia Karl Moritz Hermann and Phil Blunsom Department of Computer Science University of Oxford Oxford, OX1 3QD, UK {karl.moritz.hermann,phil.blunsom}@cs.ox.ac.uk
Pseudocode No The paper describes the model using equations and textual explanations, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Results for other dimensionalities and our source code for our model are available at http://www.karlmoritz.com.
Open Datasets Yes We use the Europarl corpus (v7)1 for training the bilingual model. The corpus was pre-processed using the set of tools provided by cdec2 [9] for tokenizing and lowercasing the data. 1http://www.statmt.org/europarl/
Dataset Splits Yes We ran the CLDC experiments both by training on English and testing on German documents and vice versa. Using the data splits provided by [16], we used varying training data sizes from 100 to 10,000 documents for training the multiclass classifier.
Hardware Specification No The paper does not specify any hardware details such as CPU, GPU models, or memory used for running the experiments.
Software Dependencies No The paper mentions using 'cdec' for pre-processing and refers to an 'averaged perceptron classifier' implementation from prior work, but it does not specify any other software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other library versions).
Experiment Setup Yes L2 regularization (1), step-size (0.1), number of noise elements (50), margin size (50), embedding dimensionality (d=40). We use the adaptive gradient method, Ada Grad [8], for updating the weights of our models, and terminate training after 50 iterations.