User-Dependent Neural Sequence Models for Continuous-Time Event Data
Authors: Alex Boyd, Robert Bamler, Stephan Mandt, Padhraic Smyth
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our methods on four large real-world datasets and demonstrate systematic improvements from our approach over existing work for a variety of predictive metrics such as log-likelihood, next event ranking, and source-of-sequence identification. |
| Researcher Affiliation | Academia | Alex Boyd1 Robert Bamler2 Stephan Mandt1,2 Padhraic Smyth1,2 1Department of Statistics 2Department of Computer Science University of California, Irvine {alexjb, rbamler, mandt}@uci.edu smyth@ics.uci.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our source code for modeling and experiments can be found at the following repository: https://github. com/ajboyd2/vae_mpp. |
| Open Datasets | Yes | All models were trained and evaluated on four real-world datasets (see Table 1). The Meme Tracker dataset [Leskovec and Krevl, 2014]... The Reddit comments dataset [Baumgartner et al., 2020]... Amazon Reviews [Ni et al., 2019]... The 4th dataset, Last FM [Celma, 2010] |
| Dataset Splits | Yes | Training, validation, and test sets were split so that there were no users in common between them. ... Table 1: Statistics for the four datasets. Columns (left to right) are: ... total number of sequences and number of unique users in training/validation/test splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not explicitly state specific software dependencies or their version numbers required to replicate the experiments. |
| Experiment Setup | Yes | Models were trained by minimizing Eq. 7 and Eq. 8, averaged over training sequences, for the decoder-only and Mo E variants respectively via the Adam optimizer with default hyperparameters [Kingma and Ba, 2014] and a learning rate of 0.001. A linear warm-up schedule for the learning rate over the first training epoch was used as it led to more stable training across runs. We also performed cyclical annealing on β in Eq. 8 from 0 to 0.001 with a period of 20% of an epoch to help prevent the posterior distribution from collapsing to the prior [Fu et al., 2019]. |