Pre-Trained Multi-View Word Embedding Using Two-Side Neural Network
Authors: Yong Luo, Jian Tang, Jun Yan, Chao Xu, Zheng Chen
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on two real-world applications including web search ranking and word similarity measuring show that our neural network with multiple sources outperforms state-of-the-art word embedding algorithm with each individual source. It also outperforms other competitive baselines using multiple sources. |
| Researcher Affiliation | Collaboration | School of Electronics Engineering and Computer Science, Peking University, Beijing, China (email: yluo180@gmail.com; tangjianpku@gmail.com; xuchao@cis.pku.edu.cn) Microsoft Research Asia, Beijing, China (email: {junyan, zhengc}@microsoft.com) |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to its own source code. The footnote link to 'word2vec' refers to a third-party baseline used by the authors, not their implemented methodology. |
| Open Datasets | Yes | The data used to train word embedding on free text is the first billion characters from Wikipedia2, and we call the learned word embedding CBOW wiki. 2http://mattmahoney.net/dc/textdata.html ... We use the Word Sim353 (Finkelstein et al. 2002) dataset, one of the most popular collection of this kind, to evaluate our model for computing semantic word relatedness. |
| Dataset Splits | No | The paper states: 'We randomly sampled 300 word pairs as the training set, and the remaining 53 word pairs are used for test.' and refers to 'training set' and 'test sets' (FQ, TQ) for search ranking. However, it does not explicitly describe a validation dataset split for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | In our implementation, we set the dimensionalities of all the pre-trained word embeddings and the final unified word embedding to be 192. Thus the first hidden layer size is also 192. The middle hidden layer size is parameter that can be decided using grid-search. The initialization of the network is according to (Montavon, Orr, and Muller 2012)... That is, λ is start with a large value, e.g., 0.25, and then decreases with the number of iterations until it is smaller than a threshold. ... The nonlinear hidden layer size is set to be 576=192 3, which is the sum of the input word embedding sizes. |