reproducibilityindex.ai

Spectral Word Embedding with Negative Sampling

Authors: Behrouz Haji Soleimani, Stan Matwin

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have trained various word embedding algorithms on articles of Wikipedia with 2.1 billion tokens and show that negative sampling can boost the quality of spectral methods. Our algorithm provides results as good as the state-of-the-art but in a much faster and efﬁcient way.
Researcher Affiliation	Academia	Behrouz Haji Soleimani, Stan Matwin Faculty of Computer Science, Dalhousie University Institute for Big Data Analytics 6050 University Avenue, Halifax, NS, Canada behrouz.hajisoleimani@dal.ca, st837183@dal.ca
Pseudocode	No	The paper describes the steps of its algorithm in paragraph form (e.g., 'Our approach at a glance builds...'), but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	The source code of our algorithm is available at: https://github.com/behrouzhs/svdns.
Open Datasets	Yes	For the training of models, we have used English Wikipedia dump of March 05, 2016.
Dataset Splits	No	The paper mentions training on Wikipedia and evaluating on other datasets (word similarity, analogy) but does not specify a separate validation split of the training data.
Hardware Specification	No	The paper does not specify any hardware details such as CPU models, GPU models, or memory used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup	Yes	The dimensionality of embeddings is 100 in all our experiments. ...GloVe is trained with its recommended parameter setting (i.e. xmax = 100), CBOW and SGNS are trained with negative sampling set to 5. Our proposed algorithm, SVD-NS, is trained with α = 2.5.