reproducibilityindex.ai

Leveraging Monolingual Data for Crosslingual Compositional Word Representations

Authors: Hubert Soyer, Pontus Stenetorp, and Akiko Aizawa

ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on a well-established crosslingual document classiﬁcation task and achieve results that are either comparable, or greatly improve upon previous state-of-the-art methods. Concretely, our method reaches a level of 92.7% and 84.4% accuracy for the English to German and German to English sub-tasks respectively.
Researcher Affiliation	Academia	Hubert Soyer National Institute of Informatics, Tokyo, Japan soyer@nii.ac.jp Pontus Stenetorp University of Tokyo, Tokyo, Japan pontus@stenetorp.se Akiko Aizawa National Institute of Informatics, Tokyo, Japan aizawa@nii.ac.jp
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Our implementation is available at https://github.com/ogh/binclusion
Open Datasets	Yes	Like Klementiev et al. (2012) we choose Euro Parl v7 (Koehn, 2005) as our bilingual corpus and leverage the English and German parts of the RCV1 and RCV2 corpora as monolingual resources.
Dataset Splits	No	The paper mentions tuning hyperparameters on "held out documents" but does not provide specific details on the size or percentages of a validation split. It states the test set size, but not a validation set.
Hardware Specification	No	The paper mentions training on a "single-core desktop computer" but does not provide specific hardware details such as CPU model, GPU model, or memory specifications.
Software Dependencies	No	The paper mentions software like "NLTK" and "cdec decoder" and the programming language "Julia" but does not specify version numbers for any of these components.
Experiment Setup	Yes	We tuned all hyperparameters of our model and explored learning rates around 0.2, mini-batch sizes around 40,000, hinge loss margins around 40 (since our vector dimensionality is 40) and λ (regularization) around 1.0.