Complementary Learning of Word Embeddings

Authors: Yan Song, Shuming Shi

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results indicate that our approach can effectively improve the quality of initial embeddings, in terms of intrinsic and extrinsic evaluations.
Researcher Affiliation Industry Yan Song, Shuming Shi Tencent AI Lab {clksong, shumingshi}@tencent.com
Pseudocode Yes Algorithm 1: Complementary learning of word embeddings using CB and SG.
Open Source Code No The paper does not provide explicit information or a link to open-source code for the described methodology.
Open Datasets Yes We prepare the latest dump of Wikipedia articles3 as the base corpus for training word embeddings, which contains approximately 2 billion word tokens.3https://dumps.wikimedia.org/enwiki/latest/. We use the MEN-3k [Bruni et al., 2012], Simlex-999 [Hill et al., 2015] and WS-353 [Finkelstein et al., 2002] data sets...The extrinsic evaluation is conducted on text classification with four datasets: the 20Newsgroups (20NG)4 for topic classification, ATIS [Hemphill et al., 1990] for intent classification, TREC [Li and Roth, 2002] for question type classification and IMDB [Maas et al., 2011] for sentiment classification.4The bydate version on the web site: http://qwone.com/~jason/20Newsgroups/
Dataset Splits Yes All datasets are organized following their standard split.
Hardware Specification No The paper does not provide specific details regarding the hardware used for the experiments (e.g., GPU/CPU models, memory).
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes All baseline and our embedding models are trained with the same hyper-parameters, i.e., 200 dimensions, 5 as the word frequency cutoff, a windows size of 5 words, 2 4 iterations, using hierarchical softmax as learning strategy. ... discount learning rates γ1 and γ2 are required as input. ... hyper-parameter λ adjusting the contribution of different sub-rewards.