Self-Attentive Hawkes Process
Authors: Qiang Zhang, Aldo Lipani, Omer Kirnap, Emine Yilmaz
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on goodness-of-fit and prediction tasks show the improved capability of SAHP. Furthermore, SAHP is more interpretable than RNN-based counterparts because the learnt attention weights reveal contributions of one event type to the happening of another type. |
| Researcher Affiliation | Collaboration | 1University College London, London, United Kingdom 2Amazon, London, United Kingdom. |
| Pseudocode | No | The paper describes the Self-Attentive Hawkes Process components and logic, but does not include a formal pseudocode block or algorithm box. |
| Open Source Code | Yes | The software used to run these experiments is available at the following weblink3. 3https://github.com/Qiang AIResearcher/sahp_repo |
| Open Datasets | Yes | These datasets are all available at the following weblink1. 1https://drive.google.com/drive/folders/0Bwqm V0Eco Uc8Ukl IR1BKV25YR1U |
| Dataset Splits | Yes | Each dataset is split into a training set, a validation set and a testing set. The validation set is used to tune the hyper-parameters while the testing set is used to measure the model performance. Details about the datasets can be found in Table 1 and Appendix. Table 1: Dataset # of Types Sequence Length # of Sequences Min Mean Max Train Validation Test Synth. 2 68 132 269 3,200 400 400 RT 3 50 109 264 20,000 2,000 2,000 SOF 22 41 72 736 4,777 530 1,326 MMC 75 2 4 33 527 58 65 |
| Hardware Specification | Yes | we report in Table 5 the running time on the retweet dataset with a Titan Xp GPU card. |
| Software Dependencies | No | The paper mentions using the 'tick' Python library for synthetic data generation but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The number of heads is a hyper-parameter. We explore this hyper-parameter in the set {1, 2, 4, 8, 16}. Another hyper-parameter is the number of attention layers. We explore this hyper-parameter in the set {2, 3, 4, 5, 6}. We adapt the Adam as the basic optimiser and develop a warm-up stage for the learning rate whose initialisation is set to 1e 4. To mitigate overfitting we apply dropout with rate set to 0.1. Early stopping is used when the validation loss does not decrease more than 1e 3. |