reproducibilityindex.ai

Modeling Order in Neural Word Embeddings at Scale

Authors: Andrew Trask, David Gilmore, Matthew Russell

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The model produces several vector spaces with meaningful substructure, as evidenced by its performance of 85.8% on a recent word-analogy task, exceeding best published syntactic word-analogy scores by a 58% error margin (Pennington et al., 2014). We conduct experiments on the word-analogy task of (Mikolov et al., 2013a).
Researcher Affiliation	Industry	Andrew Trask ANDREW.TRASK@DIGITALREASONING.COM Digital Reasoning Systems, Inc., Nashville, TN USA David Gilmore DAVID.GILMORE@DIGITALREASONING.COM Digital Reasoning Systems, Inc., Nashville, TN USA Matthew Russell MATTHEW.RUSSELL@DIGITALREASONING.COM Digital Reasoning Systems, Inc., Nashville, TN USA
Pseudocode	Yes	Algorithm 1 Dense Interpolated Embedding Pseudocode
Open Source Code	No	All training occurs over the dataset available from the Google word2vec website2, using the packaged word-analogy evaluation script. The dataset contains approximately 8 billion words collected from English News Crawl, 1-Billion-Word Benchmark, UMBC Webbase, and English Wikipedia. The dataset used leverages the default dataphrase2.txt normalization in all training, which includes both single tokens and phrases. Unless otherwise speciﬁed, all parameters for training and evaluating are identical to the default parameters speciﬁed in the default word2vec big model, which is freely available online. (The paper refers to third-party code used, but does not provide its own code as open source).
Open Datasets	Yes	All training occurs over the dataset available from the Google word2vec website2, using the packaged word-analogy evaluation script. The dataset contains approximately 8 billion words collected from English News Crawl, 1-Billion-Word Benchmark, UMBC Webbase, and English Wikipedia.
Dataset Splits	No	The paper does not provide specific dataset splits for validation or describe a validation methodology.
Hardware Specification	Yes	Furthermore, the model includes several parallel training methods, most notably allowing a skip-gram network with 160 billion parameters to be trained overnight on 3 multi-core CPUs, 14x larger than the previous largest neural network (Coates et al., 2013). A 160 billion parameter network was also trained overnight on 3 multi-core CPUs, however it yielded 20000 dimensional vectors for each word and subsequently overﬁt the training data.
Software Dependencies	No	The paper mentions using the 'packaged word-analogy evaluation script' and refers to 'word2vec', but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Unless otherwise speciﬁed, all parameters for training and evaluating are identical to the default parameters speciﬁed in the default word2vec big model, which is freely available online. Table 3 shows the performance of the default CBOW implementation of word2vec relative to CLOW and DIEM when conﬁgured to 2000 dimensional embeddings. For the DIEM experiment, each analogy query was ﬁrst performed by running the query on CLOW and DIEM independently, and selecting the top thousand CLOW cosine similarities. We summed the squared cosine similarity of each of these top thousand with each associated cosine similarity returned by the DIEM and resorted.