Using k-Way Co-Occurrences for Learning Word Embeddings

Authors: Danushka Bollegala, Yuichi Yoshida, Ken-ichi Kawarabayashi

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results show that the derived theoretical relationship does indeed hold empirically, and despite data sparsity, for some smaller k( 5) values, k-way embeddings perform comparably or better than 2-way embeddings in a range of tasks. We evaluate the word embeddings created from k-way cooccurrences on multiple benchmark datasets for semantic similarity measurement, analogy detection, relation classification, and short-text classification ( 5.2).
Researcher Affiliation Collaboration Danushka Bollegala,1 Yuichi Yoshida,2 Ken-ichi Kawarabayashi2,3 University of Liverpool, Liverpool, L693BX, United Kingdom1 National Institute of Informatics, 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan2 Japan Science and Technology Agency, ERATO, Kawarabayashi Large Graph Project3
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets Yes We pre-processed a January 2017 dump of English Wikipedia using a Perl script1 and used as our corpus (contains ca. 4.6B tokens). 1http://mattmahoney.net/dc/textdata.html
Dataset Splits No The paper mentions using training and test portions for short-text classification, but does not provide specific details on dataset splits (e.g., percentages or counts) for reproduction across all experiments, especially for the main word embedding training.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions 'Perl script', 'Ada Grad', and 'SGD' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The initial learning rate is set to 0.01 in all experiments. Downweighting very frequent co-occurrences of words has shown to be effective in prior work. This can be easily incorporated into the objective function (5) by replacing h(wk 1) by a truncated version such as min(h(wk 1), θk), where θ is a cut-off threshold, where we set θ = 100 following prior work.