reproducibilityindex.ai

Enhancing Semantic Representations of Bilingual Word Embeddings with Syntactic Dependencies

Authors: Linli Xu, Wenjun Ouyang, Xiaoying Ren, Yang Wang, Liang Jiang

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results on a real world dataset clearly validate the superiority of the proposed model Dep Bi WE on various natural language processing (NLP) tasks.
Researcher Affiliation	Academia	Linli Xu, Wenjun Ouyang, Xiaoying Ren, Yang Wang and Liang Jiang Anhui Province Key Laboratory of Big Data Analysis and Application School of Computer Science and Technology, University of Science and Technology of China linlixu@ustc.edu.cn, {oy01, wjren}@mail.ustc.edu.cn angyan@ustc.edu.cn, jal@mail.ustc.edu.cn
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code	No	The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets	Yes	We train our dependency-based bilingual models for the English-German (en-de), English-French (en-fr) and English-Spanish (en-es) language pairs on the Europarl v7 parallel corpus1 [Koehn, 2005]. ... The training and test data are sourced from the Reuters RCV1/RCV2 multilingual corpus [Lewis et al., 2004]
Dataset Splits	Yes	For the classiﬁcation experiments, 15,000 documents for each language are selected randomly from the RCV1/RCV2 corpus, in which 5,000 documents are used as the test data and a subset with varying sizes between 100 and 10,000 of the remainder serves as the training data. Meanwhile, we keep 1,000 documents as the development set for hyper-parameter tuning.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	Yes	Word alignments are obtained with Fast Align [Dyer et al., 2013], and a python library spa Cy2 is employed to produce the dependency parse-trees for all languages in the parallel corpus for the dependency-based models.
Experiment Setup	Yes	Parameters for bilingual embedding learning are set as suggested in Bi Skip [Luong et al., 2015] and ﬁxed for all experiments. The subsampling rate, negative sampling size are set to 1e-4 and 30 respectively; the default learning rate of Stochastic Gradient Decent (SGD) is set to 0.025 and gradually decreases to 2.5e-6 when training is ﬁnished. The dimensionality of all embedding vectors d is set to 200, and experiments are run for 10 epochs. We set the monolingual weight α and bilingual weight β in Equation (4) to 1.0 and 4.0 respectively, with the regularization weight γR =0.1.