Decomposable Transformer Point Processes

Authors: Aristeidis Panos

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We considered two different tasks to assess the predictive performance of our proposed method: Goodness-of-fit/next-event prediction and long-horizon prediction. We compared our method DTPP to several strong baselines over five real-world datasets and three synthetic ones.
Researcher Affiliation Academia Aristeidis Panos University of Cambridge ap2313@cam.ac.uk
Pseudocode Yes Algorithm 1 Long-Horizon Prediction for Decomposed Transformer Point Processes; Algorithm 2 Thinning Algorithm
Open Source Code Yes Our framework was implemented with Py Torch [31] and scikit-learn [32]; the code is available at https://github.com/aresPanos/dtpp.
Open Datasets Yes We fit the above six models on a diverse collection of five popular real-world datasets, each with varied characteristics: MIMIC-II [19], Amazon [28], Taxi [37], Taobao [43], and Stack Overlfow V1 [20, 41].
Dataset Splits Yes We use 200 epochs in total, a batch size of 8 sequences, and we apply early-stopping based on the log-likelihood of the held-out dev set.
Hardware Specification No Section A.2 'Training Details' mentions: 'All experiments were carried out on the same Linux machine with a dedicated reserved GPU used for acceleration.' This description is too general and lacks specific details such as the GPU model, CPU type, or memory, which are necessary for hardware reproducibility.
Software Dependencies No The paper mentions 'Our framework was implemented with Py Torch [31] and scikit-learn [32]' and also lists other repositories for baselines (e.g., 'https://github.com/yangalan123/anhp-andtt'). However, it does not provide specific version numbers for these software dependencies, which is required for reproducible software descriptions.
Experiment Setup Yes We use the Adam optimizer [18] with its default settings to train all the models in Section 5. We use 200 epochs in total, a batch size of 8 sequences, and we apply early-stopping based on the log-likelihood of the held-out dev set. ... The hyperparameters D and L were fine-tuned for each combination of dataset and model. We gridsearch the two parameters using the search spaces D {4, 8, 16, 32, 64, 128} and L {1, 2, 3, 4, 5}.