Add and Thin: Diffusion for Temporal Point Processes

Authors: David Lüdke, Marin Biloš, Oleksandr Shchur, Marten Lienen, Stephan Günnemann

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments on synthetic and real-world datasets, our model matches the state-of-the-art TPP models in density estimation and strongly outperforms them in forecasting.
Researcher Affiliation Collaboration 1School of Computation, Information and Technology, Technical University of Munich, Germany 2Munich Data Science Institute, Technical University of Munich, Germany 3Machine Learning Research, Morgan Stanley, United States 4Amazon Web Services, Germany
Pseudocode Yes Algorithm 1: Sampling
Open Source Code Yes Code is available at https://www.cs.cit.tum.de/daml/add-thin
Open Datasets Yes ADD-THIN is evaluated on 7 real-world datasets proposed by Shchur et al. [42] and 6 synthethic datasets from Omi et al. [37]. ... We split each dataset into train, validation, and test set containing 60%, 20%, and 20% of the event sequences, respectively.
Dataset Splits Yes We split each dataset into train, validation, and test set containing 60%, 20%, and 20% of the event sequences, respectively.
Hardware Specification Yes All models but the transformer baseline were trained on an Intel Xeon E5-2630 v4 @ 2.20 GHz CPU with 256GB RAM and an NVIDIA Ge Force GTX 1080 Ti. Given its RAM requirement, the transformer baseline had to be trained with batch size 32 on an NVIDIA A100-PCIE-40GB for the Reddit-C and Reddit-S datasets.
Software Dependencies No The paper mentions 'Adam [22]' (an optimizer) and 'Glide [34]' (a model/paper reference) but does not provide specific version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other relevant software libraries used in the implementation.
Experiment Setup Yes For our model, we set the number of diffusion steps to N = 100, apply the cosine betaschedule proposed in Glide [34], and set λHPP = 1 for the noising process. We apply early stopping, hyperparameter tuning, and model selection on the validation set for each model. Further hyperparameter and training details are reported in Appendix D. ...We use a hidden dimension of 32 for all models. Further, we have tuned the learning rate in {0.01, 0.001} for all models, the number of mixture components in {8, 16} for ADD-THIN, RNN and Transformer, the number of knots in {10, 20} for Tri TPP and the number of attention layers in {2, 3} for the transformer baseline. ...the GD baseline has been trained with a batch size of 16, as recommended by the authors.