reproducibilityindex.ai

Pre-Trained Multi-View Word Embedding Using Two-Side Neural Network

Authors: Yong Luo, Jian Tang, Jun Yan, Chao Xu, Zheng Chen

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on two real-world applications including web search ranking and word similarity measuring show that our neural network with multiple sources outperforms state-of-the-art word embedding algorithm with each individual source. It also outperforms other competitive baselines using multiple sources.
Researcher Affiliation	Collaboration	School of Electronics Engineering and Computer Science, Peking University, Beijing, China (email: yluo180@gmail.com; tangjianpku@gmail.com; xuchao@cis.pku.edu.cn) Microsoft Research Asia, Beijing, China (email: {junyan, zhengc}@microsoft.com)
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to its own source code. The footnote link to 'word2vec' refers to a third-party baseline used by the authors, not their implemented methodology.
Open Datasets	Yes	The data used to train word embedding on free text is the ﬁrst billion characters from Wikipedia2, and we call the learned word embedding CBOW wiki. 2http://mattmahoney.net/dc/textdata.html ... We use the Word Sim353 (Finkelstein et al. 2002) dataset, one of the most popular collection of this kind, to evaluate our model for computing semantic word relatedness.
Dataset Splits	No	The paper states: 'We randomly sampled 300 word pairs as the training set, and the remaining 53 word pairs are used for test.' and refers to 'training set' and 'test sets' (FQ, TQ) for search ranking. However, it does not explicitly describe a validation dataset split for hyperparameter tuning or early stopping.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	In our implementation, we set the dimensionalities of all the pre-trained word embeddings and the ﬁnal uniﬁed word embedding to be 192. Thus the ﬁrst hidden layer size is also 192. The middle hidden layer size is parameter that can be decided using grid-search. The initialization of the network is according to (Montavon, Orr, and Muller 2012)... That is, λ is start with a large value, e.g., 0.25, and then decreases with the number of iterations until it is smaller than a threshold. ... The nonlinear hidden layer size is set to be 576=192 3, which is the sum of the input word embedding sizes.