Approximating Word Ranking and Negative Sampling for Word Embedding

Authors: Guibing Guo, Shichang Ouyang, Fajie Yuan, Xingwei Wang

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical experiments show that Opt Rank consistently outperforms its counterparts on a benchmark dataset with different sampling scales, especially when the sampled subset is small. The code and datasets can be obtained from https : //github.com/ouououououou/Opt Rank
Researcher Affiliation Academia Northeastern University, China University of Glasgow, UK
Pseudocode Yes Algorithm 1: The Opt Rank learning algorithm
Open Source Code Yes The code and datasets can be obtained from https : //github.com/ouououououou/Opt Rank
Open Datasets Yes The training dataset used in our experiments is the Wikipedia 2017 articles (Wiki2017)2, which contains around 2.3 billion words (14G). 2http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
Dataset Splits No The paper mentions training on Wikipedia 2017 articles and testing on various benchmark datasets (word analogy, word similarity datasets), but does not explicitly describe a validation set or specific train/validation/test splits from the primary training data.
Hardware Specification No The paper does not specify any hardware details such as GPU/CPU models, memory, or specific computing environments used for running the experiments.
Software Dependencies No The paper describes parameter settings for the models but does not provide specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes For CBOW-p, CBOW-a and Opt Rank models, as suggested by [Mikolov et al., 2013; Chen et al., 2017], down-sampled rate is set to 0.001; the learning rate starts with a = 0.025 and changes by at = a(1 t/T), where T is the sample size and t is the iteration of current training examples. Besides, window size = 8, dimension = 300, and the size of negative sample is 15 in five subsets, and 2 in the whole Wiki2017 dataset, respectively. For the parameter power used in negative sampling, we find that power = 0.75 offers the best accuracy for CBOW-p and Opt Rank model, while power = 0.005 is suggested by [Chen et al., 2017] and adopted for CBOW-a. Specially, the value of ε in Opt Rank should be adjust to the size of the corpus. We set ε as 0.5 in five subsets and 1.0 in Wiki2017(14G). For the Word Rank model, we adopt the settings given by [Ji et al., 2015]: logarithm as the objective function, initial value of scale parameter is α = 100 and offset parameter β = 99. The dimension of word vectors is also set to 300.