Modeling Order in Neural Word Embeddings at Scale
Authors: Andrew Trask, David Gilmore, Matthew Russell
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The model produces several vector spaces with meaningful substructure, as evidenced by its performance of 85.8% on a recent word-analogy task, exceeding best published syntactic word-analogy scores by a 58% error margin (Pennington et al., 2014). We conduct experiments on the word-analogy task of (Mikolov et al., 2013a). |
| Researcher Affiliation | Industry | Andrew Trask ANDREW.TRASK@DIGITALREASONING.COM Digital Reasoning Systems, Inc., Nashville, TN USA David Gilmore DAVID.GILMORE@DIGITALREASONING.COM Digital Reasoning Systems, Inc., Nashville, TN USA Matthew Russell MATTHEW.RUSSELL@DIGITALREASONING.COM Digital Reasoning Systems, Inc., Nashville, TN USA |
| Pseudocode | Yes | Algorithm 1 Dense Interpolated Embedding Pseudocode |
| Open Source Code | No | All training occurs over the dataset available from the Google word2vec website2, using the packaged word-analogy evaluation script. The dataset contains approximately 8 billion words collected from English News Crawl, 1-Billion-Word Benchmark, UMBC Webbase, and English Wikipedia. The dataset used leverages the default dataphrase2.txt normalization in all training, which includes both single tokens and phrases. Unless otherwise specified, all parameters for training and evaluating are identical to the default parameters specified in the default word2vec big model, which is freely available online. (The paper refers to third-party code used, but does not provide its own code as open source). |
| Open Datasets | Yes | All training occurs over the dataset available from the Google word2vec website2, using the packaged word-analogy evaluation script. The dataset contains approximately 8 billion words collected from English News Crawl, 1-Billion-Word Benchmark, UMBC Webbase, and English Wikipedia. |
| Dataset Splits | No | The paper does not provide specific dataset splits for validation or describe a validation methodology. |
| Hardware Specification | Yes | Furthermore, the model includes several parallel training methods, most notably allowing a skip-gram network with 160 billion parameters to be trained overnight on 3 multi-core CPUs, 14x larger than the previous largest neural network (Coates et al., 2013). A 160 billion parameter network was also trained overnight on 3 multi-core CPUs, however it yielded 20000 dimensional vectors for each word and subsequently overfit the training data. |
| Software Dependencies | No | The paper mentions using the 'packaged word-analogy evaluation script' and refers to 'word2vec', but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Unless otherwise specified, all parameters for training and evaluating are identical to the default parameters specified in the default word2vec big model, which is freely available online. Table 3 shows the performance of the default CBOW implementation of word2vec relative to CLOW and DIEM when configured to 2000 dimensional embeddings. For the DIEM experiment, each analogy query was first performed by running the query on CLOW and DIEM independently, and selecting the top thousand CLOW cosine similarities. We summed the squared cosine similarity of each of these top thousand with each associated cosine similarity returned by the DIEM and resorted. |