User-Dependent Neural Sequence Models for Continuous-Time Event Data

Authors: Alex Boyd, Robert Bamler, Stephan Mandt, Padhraic Smyth

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our methods on four large real-world datasets and demonstrate systematic improvements from our approach over existing work for a variety of predictive metrics such as log-likelihood, next event ranking, and source-of-sequence identification.
Researcher Affiliation Academia Alex Boyd1 Robert Bamler2 Stephan Mandt1,2 Padhraic Smyth1,2 1Department of Statistics 2Department of Computer Science University of California, Irvine {alexjb, rbamler, mandt}@uci.edu smyth@ics.uci.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our source code for modeling and experiments can be found at the following repository: https://github. com/ajboyd2/vae_mpp.
Open Datasets Yes All models were trained and evaluated on four real-world datasets (see Table 1). The Meme Tracker dataset [Leskovec and Krevl, 2014]... The Reddit comments dataset [Baumgartner et al., 2020]... Amazon Reviews [Ni et al., 2019]... The 4th dataset, Last FM [Celma, 2010]
Dataset Splits Yes Training, validation, and test sets were split so that there were no users in common between them. ... Table 1: Statistics for the four datasets. Columns (left to right) are: ... total number of sequences and number of unique users in training/validation/test splits.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not explicitly state specific software dependencies or their version numbers required to replicate the experiments.
Experiment Setup Yes Models were trained by minimizing Eq. 7 and Eq. 8, averaged over training sequences, for the decoder-only and Mo E variants respectively via the Adam optimizer with default hyperparameters [Kingma and Ba, 2014] and a learning rate of 0.001. A linear warm-up schedule for the learning rate over the first training epoch was used as it led to more stable training across runs. We also performed cyclical annealing on β in Eq. 8 from 0 to 0.001 with a period of 20% of an epoch to help prevent the posterior distribution from collapsing to the prior [Fu et al., 2019].