reproducibilityindex.ai

Neural Word Embedding as Implicit Matrix Factorization

Authors: Omer Levy, Yoav Goldberg

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the word representations on four dataset, covering word similarity and relational analogy tasks. We used two datasets to evaluate pairwise word similarity: Finkelstein et al.’s Word Sim353 [13] and Bruni et al.’s MEN [4]. These datasets contain word pairs together with human-assigned similarity scores.
Researcher Affiliation	Academia	Omer Levy Department of Computer Science Bar-Ilan University omerlevy@gmail.com Yoav Goldberg Department of Computer Science Bar-Ilan University yoav.goldberg@gmail.com
Pseudocode	No	The paper describes methods like SGNS and SVD but does not provide them in a structured pseudocode or algorithm block.
Open Source Code	Yes	To train the SGNS models, we used a modified version of word2vec which receives a sequence of pre-extracted word-context pairs [18].4 ... 4http://www.bitbucket.org/yoavgo/word2vecf
Open Datasets	Yes	All models were trained on English Wikipedia, pre-processed by removing non-textual elements, sentence splitting, and tokenization. The corpus contains 77.5 million sentences, spanning 1.5 billion tokens.
Dataset Splits	No	No explicit training/validation/test dataset splits (e.g., percentages, sample counts, or cross-validation setup) are provided for the English Wikipedia corpus used for training.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments are mentioned.
Software Dependencies	No	The paper mentions using 'a modified version of word2vec' but does not provide specific version numbers for this or any other software dependencies.
Experiment Setup	Yes	All models were derived using a window of 2 tokens to each side of the focus word, ignoring words that appeared less than 100 times in the corpus, resulting in vocabularies of 189,533 terms for both words and contexts. ... We experimented with three values of k (number of negative samples in SGNS, shift parameter in PMI-based methods): 1, 5, 15.