Spectral Word Embedding with Negative Sampling

Authors: Behrouz Haji Soleimani, Stan Matwin

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We have trained various word embedding algorithms on articles of Wikipedia with 2.1 billion tokens and show that negative sampling can boost the quality of spectral methods. Our algorithm provides results as good as the state-of-the-art but in a much faster and efficient way.
Researcher Affiliation Academia Behrouz Haji Soleimani, Stan Matwin Faculty of Computer Science, Dalhousie University Institute for Big Data Analytics 6050 University Avenue, Halifax, NS, Canada behrouz.hajisoleimani@dal.ca, st837183@dal.ca
Pseudocode No The paper describes the steps of its algorithm in paragraph form (e.g., 'Our approach at a glance builds...'), but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes The source code of our algorithm is available at: https://github.com/behrouzhs/svdns.
Open Datasets Yes For the training of models, we have used English Wikipedia dump of March 05, 2016.
Dataset Splits No The paper mentions training on Wikipedia and evaluating on other datasets (word similarity, analogy) but does not specify a separate validation split of the training data.
Hardware Specification No The paper does not specify any hardware details such as CPU models, GPU models, or memory used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup Yes The dimensionality of embeddings is 100 in all our experiments. ...GloVe is trained with its recommended parameter setting (i.e. xmax = 100), CBOW and SGNS are trained with negative sampling set to 5. Our proposed algorithm, SVD-NS, is trained with α = 2.5.