Approximating Word Ranking and Negative Sampling for Word Embedding
Authors: Guibing Guo, Shichang Ouyang, Fajie Yuan, Xingwei Wang
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical experiments show that Opt Rank consistently outperforms its counterparts on a benchmark dataset with different sampling scales, especially when the sampled subset is small. The code and datasets can be obtained from https : //github.com/ouououououou/Opt Rank |
| Researcher Affiliation | Academia | Northeastern University, China University of Glasgow, UK |
| Pseudocode | Yes | Algorithm 1: The Opt Rank learning algorithm |
| Open Source Code | Yes | The code and datasets can be obtained from https : //github.com/ouououououou/Opt Rank |
| Open Datasets | Yes | The training dataset used in our experiments is the Wikipedia 2017 articles (Wiki2017)2, which contains around 2.3 billion words (14G). 2http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 |
| Dataset Splits | No | The paper mentions training on Wikipedia 2017 articles and testing on various benchmark datasets (word analogy, word similarity datasets), but does not explicitly describe a validation set or specific train/validation/test splits from the primary training data. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU/CPU models, memory, or specific computing environments used for running the experiments. |
| Software Dependencies | No | The paper describes parameter settings for the models but does not provide specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | For CBOW-p, CBOW-a and Opt Rank models, as suggested by [Mikolov et al., 2013; Chen et al., 2017], down-sampled rate is set to 0.001; the learning rate starts with a = 0.025 and changes by at = a(1 t/T), where T is the sample size and t is the iteration of current training examples. Besides, window size = 8, dimension = 300, and the size of negative sample is 15 in five subsets, and 2 in the whole Wiki2017 dataset, respectively. For the parameter power used in negative sampling, we find that power = 0.75 offers the best accuracy for CBOW-p and Opt Rank model, while power = 0.005 is suggested by [Chen et al., 2017] and adopted for CBOW-a. Specially, the value of ε in Opt Rank should be adjust to the size of the corpus. We set ε as 0.5 in five subsets and 1.0 in Wiki2017(14G). For the Word Rank model, we adopt the settings given by [Ji et al., 2015]: logarithm as the objective function, initial value of scale parameter is α = 100 and offset parameter β = 99. The dimension of word vectors is also set to 300. |