Decomposable Transformer Point Processes
Authors: Aristeidis Panos
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We considered two different tasks to assess the predictive performance of our proposed method: Goodness-of-fit/next-event prediction and long-horizon prediction. We compared our method DTPP to several strong baselines over five real-world datasets and three synthetic ones. |
| Researcher Affiliation | Academia | Aristeidis Panos University of Cambridge ap2313@cam.ac.uk |
| Pseudocode | Yes | Algorithm 1 Long-Horizon Prediction for Decomposed Transformer Point Processes; Algorithm 2 Thinning Algorithm |
| Open Source Code | Yes | Our framework was implemented with Py Torch [31] and scikit-learn [32]; the code is available at https://github.com/aresPanos/dtpp. |
| Open Datasets | Yes | We fit the above six models on a diverse collection of five popular real-world datasets, each with varied characteristics: MIMIC-II [19], Amazon [28], Taxi [37], Taobao [43], and Stack Overlfow V1 [20, 41]. |
| Dataset Splits | Yes | We use 200 epochs in total, a batch size of 8 sequences, and we apply early-stopping based on the log-likelihood of the held-out dev set. |
| Hardware Specification | No | Section A.2 'Training Details' mentions: 'All experiments were carried out on the same Linux machine with a dedicated reserved GPU used for acceleration.' This description is too general and lacks specific details such as the GPU model, CPU type, or memory, which are necessary for hardware reproducibility. |
| Software Dependencies | No | The paper mentions 'Our framework was implemented with Py Torch [31] and scikit-learn [32]' and also lists other repositories for baselines (e.g., 'https://github.com/yangalan123/anhp-andtt'). However, it does not provide specific version numbers for these software dependencies, which is required for reproducible software descriptions. |
| Experiment Setup | Yes | We use the Adam optimizer [18] with its default settings to train all the models in Section 5. We use 200 epochs in total, a batch size of 8 sequences, and we apply early-stopping based on the log-likelihood of the held-out dev set. ... The hyperparameters D and L were fine-tuned for each combination of dataset and model. We gridsearch the two parameters using the search spaces D {4, 8, 16, 32, 64, 128} and L {1, 2, 3, 4, 5}. |