Pre-Trained Multi-View Word Embedding Using Two-Side Neural Network

Authors: Yong Luo, Jian Tang, Jun Yan, Chao Xu, Zheng Chen

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two real-world applications including web search ranking and word similarity measuring show that our neural network with multiple sources outperforms state-of-the-art word embedding algorithm with each individual source. It also outperforms other competitive baselines using multiple sources.
Researcher Affiliation Collaboration School of Electronics Engineering and Computer Science, Peking University, Beijing, China (email: yluo180@gmail.com; tangjianpku@gmail.com; xuchao@cis.pku.edu.cn) Microsoft Research Asia, Beijing, China (email: {junyan, zhengc}@microsoft.com)
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to its own source code. The footnote link to 'word2vec' refers to a third-party baseline used by the authors, not their implemented methodology.
Open Datasets Yes The data used to train word embedding on free text is the first billion characters from Wikipedia2, and we call the learned word embedding CBOW wiki. 2http://mattmahoney.net/dc/textdata.html ... We use the Word Sim353 (Finkelstein et al. 2002) dataset, one of the most popular collection of this kind, to evaluate our model for computing semantic word relatedness.
Dataset Splits No The paper states: 'We randomly sampled 300 word pairs as the training set, and the remaining 53 word pairs are used for test.' and refers to 'training set' and 'test sets' (FQ, TQ) for search ranking. However, it does not explicitly describe a validation dataset split for hyperparameter tuning or early stopping.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup Yes In our implementation, we set the dimensionalities of all the pre-trained word embeddings and the final unified word embedding to be 192. Thus the first hidden layer size is also 192. The middle hidden layer size is parameter that can be decided using grid-search. The initialization of the network is according to (Montavon, Orr, and Muller 2012)... That is, λ is start with a large value, e.g., 0.25, and then decreases with the number of iterations until it is smaller than a threshold. ... The nonlinear hidden layer size is set to be 576=192 3, which is the sum of the input word embedding sizes.