reproducibilityindex.ai

A Variational Autoencoding Approach for Inducing Cross-lingual Word Embeddings

Authors: Liangchen Wei, Zhi-Hong Deng

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on the task of cross lingual document classiﬁcation has shown that our method is effective.
Researcher Affiliation	Academia	Key Laboratory of Machine Perception (Ministry of Education), School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China liangchen.wei@pku.edu.cn zhdeng@cis.pku.edu.cn
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets	Yes	As our joint space model utilizes parallel corpus only, we train the bilingual embeddings for the English-German language pair using Europarl v7 parallel corpus[Koehn, 2005], and use the induced representations to classify a subset of the English and German sections of the Reuters RCV1/RCV2 multilingual corpora[Lewis et al., 2004] that are assigned to only one of four categories: CCAT (Corporate/Industrial), ECAT (Economics), GCAT (Government/Social), and MCAT (Markets).
Dataset Splits	Yes	For the classiﬁcation experiment, 15000 documents(for each language) were selected randomly by Klementiev[Klementiev et al., 2012] from RCV1/RCV2 corpus. One third of the selected documents(5000) were used as test sets and a varying size between 100 and 10000 of the remainder were used as training set. Another 1000 documents were kept as development set for hyper-parameter tuning.
Hardware Specification	No	The paper mentions that the model is implemented using TensorFlow but does not specify any hardware details like CPU or GPU models, or memory.
Software Dependencies	No	The paper mentions 'Tensorﬂow', 'ADAM', 'dropout and batch normalization' but does not specify any version numbers for these software components, which is required for reproducibility.
Experiment Setup	Yes	We use 200 units for LSTM memory cell and 40 units for latent variable z, consequently 40 units for the word embeddings.