Complementary Learning of Word Embeddings
Authors: Yan Song, Shuming Shi
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results indicate that our approach can effectively improve the quality of initial embeddings, in terms of intrinsic and extrinsic evaluations. |
| Researcher Affiliation | Industry | Yan Song, Shuming Shi Tencent AI Lab {clksong, shumingshi}@tencent.com |
| Pseudocode | Yes | Algorithm 1: Complementary learning of word embeddings using CB and SG. |
| Open Source Code | No | The paper does not provide explicit information or a link to open-source code for the described methodology. |
| Open Datasets | Yes | We prepare the latest dump of Wikipedia articles3 as the base corpus for training word embeddings, which contains approximately 2 billion word tokens.3https://dumps.wikimedia.org/enwiki/latest/. We use the MEN-3k [Bruni et al., 2012], Simlex-999 [Hill et al., 2015] and WS-353 [Finkelstein et al., 2002] data sets...The extrinsic evaluation is conducted on text classification with four datasets: the 20Newsgroups (20NG)4 for topic classification, ATIS [Hemphill et al., 1990] for intent classification, TREC [Li and Roth, 2002] for question type classification and IMDB [Maas et al., 2011] for sentiment classification.4The bydate version on the web site: http://qwone.com/~jason/20Newsgroups/ |
| Dataset Splits | Yes | All datasets are organized following their standard split. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for the experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | All baseline and our embedding models are trained with the same hyper-parameters, i.e., 200 dimensions, 5 as the word frequency cutoff, a windows size of 5 words, 2 4 iterations, using hierarchical softmax as learning strategy. ... discount learning rates γ1 and γ2 are required as input. ... hyper-parameter λ adjusting the contribution of different sub-rewards. |