Using k-Way Co-Occurrences for Learning Word Embeddings
Authors: Danushka Bollegala, Yuichi Yoshida, Ken-ichi Kawarabayashi
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results show that the derived theoretical relationship does indeed hold empirically, and despite data sparsity, for some smaller k( 5) values, k-way embeddings perform comparably or better than 2-way embeddings in a range of tasks. We evaluate the word embeddings created from k-way cooccurrences on multiple benchmark datasets for semantic similarity measurement, analogy detection, relation classification, and short-text classification ( 5.2). |
| Researcher Affiliation | Collaboration | Danushka Bollegala,1 Yuichi Yoshida,2 Ken-ichi Kawarabayashi2,3 University of Liverpool, Liverpool, L693BX, United Kingdom1 National Institute of Informatics, 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan2 Japan Science and Technology Agency, ERATO, Kawarabayashi Large Graph Project3 |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | Yes | We pre-processed a January 2017 dump of English Wikipedia using a Perl script1 and used as our corpus (contains ca. 4.6B tokens). 1http://mattmahoney.net/dc/textdata.html |
| Dataset Splits | No | The paper mentions using training and test portions for short-text classification, but does not provide specific details on dataset splits (e.g., percentages or counts) for reproduction across all experiments, especially for the main word embedding training. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Perl script', 'Ada Grad', and 'SGD' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The initial learning rate is set to 0.01 in all experiments. Downweighting very frequent co-occurrences of words has shown to be effective in prior work. This can be easily incorporated into the objective function (5) by replacing h(wk 1) by a truncated version such as min(h(wk 1), θk), where θ is a cut-off threshold, where we set θ = 100 following prior work. |