Dynamic Word Embeddings
Authors: Robert Bamler, Stephan Mandt
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on three different corpora demonstrate that our dynamic model infers word embedding trajectories that are more interpretable and lead to higher predictive likelihoods than competing methods that are based on static models trained separately on time slices. |
| Researcher Affiliation | Industry | 1Disney Research, 4720 Forbes Avenue, Pittsburgh, PA 15213, USA. |
| Pseudocode | No | The paper describes the methods and algorithms in natural language and mathematical formulas, but does not include any formally structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about making the source code available or provide a link to a code repository. |
| Open Datasets | Yes | 1. We used data from the Google books corpus2 (Michel et al., 2011) from the last two centuries (T = 209). This amounts to 5 million digitized books and approximately 1010 observed words. The corpus consists of n-gram tables with n {1, . . . , 5}, annotated by year of publication. We considered the years from 1800 to 2008 (the latest available). In 1800, the size of the data is approximately 7 107 words. We used the 5-gram counts, resulting in a context window size of 4. 2http://storage.googleapis.com/books/ngrams/books/datasetsv2.html and 2. We used the State of the Union (So U) addresses of U.S. presidents, which spans more than two centuries, resulting in T = 230 different time steps and approximately 106 observed words.3 http://www.presidency.ucsb.edu/sou.php |
| Dataset Splits | No | The paper states: 'For DSG-S, we held out 10%, 10% and 20% of the documents from the Google books, So U, and Twitter corpora for testing, respectively.' It describes held-out data for testing but does not explicitly specify a separate validation split from training or test sets. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, memory, or specific cloud instance types used for running experiments. |
| Software Dependencies | No | The paper mentions various models and techniques like 'word2vec', 'skip-gram model', 'Kalman filter', and 't-SNE', but it does not specify any software names with version numbers that would be required to reproduce the experiments. |
| Experiment Setup | Yes | Hyperparameters. The vocabulary for each corpus was constructed from the 10,000 most frequent words throughout the given time period. For the Google books corpus, we chose the embedding dimension d = 200, which was also used in Kim et al. (2014). We set d = 100 for So U and Twitter... The ratio η = P ij n ij,t/ P ij n+ ij,t of negative to positive wordcontext pairs was η = 1. The precise construction of the matrices n t is explained in the supplementary material. We used the global prior variance σ2 0 = 1 for all corpora and all algorithms, including the baselines. The diffusion constant D controls the time scale on which information is shared between time steps. ... We used D = 10 3 per year for Google books and So U, and D = 1 per year for the Twitter corpus, which spans a much shorter time range. |