reproducibilityindex.ai

Self-Attentive Hawkes Process

Authors: Qiang Zhang, Aldo Lipani, Omer Kirnap, Emine Yilmaz

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on goodness-of-ﬁt and prediction tasks show the improved capability of SAHP. Furthermore, SAHP is more interpretable than RNN-based counterparts because the learnt attention weights reveal contributions of one event type to the happening of another type.
Researcher Affiliation	Collaboration	1University College London, London, United Kingdom 2Amazon, London, United Kingdom.
Pseudocode	No	The paper describes the Self-Attentive Hawkes Process components and logic, but does not include a formal pseudocode block or algorithm box.
Open Source Code	Yes	The software used to run these experiments is available at the following weblink3. 3https://github.com/Qiang AIResearcher/sahp_repo
Open Datasets	Yes	These datasets are all available at the following weblink1. 1https://drive.google.com/drive/folders/0Bwqm V0Eco Uc8Ukl IR1BKV25YR1U
Dataset Splits	Yes	Each dataset is split into a training set, a validation set and a testing set. The validation set is used to tune the hyper-parameters while the testing set is used to measure the model performance. Details about the datasets can be found in Table 1 and Appendix. Table 1: Dataset # of Types Sequence Length # of Sequences Min Mean Max Train Validation Test Synth. 2 68 132 269 3,200 400 400 RT 3 50 109 264 20,000 2,000 2,000 SOF 22 41 72 736 4,777 530 1,326 MMC 75 2 4 33 527 58 65
Hardware Specification	Yes	we report in Table 5 the running time on the retweet dataset with a Titan Xp GPU card.
Software Dependencies	No	The paper mentions using the 'tick' Python library for synthetic data generation but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	The number of heads is a hyper-parameter. We explore this hyper-parameter in the set {1, 2, 4, 8, 16}. Another hyper-parameter is the number of attention layers. We explore this hyper-parameter in the set {2, 3, 4, 5, 6}. We adapt the Adam as the basic optimiser and develop a warm-up stage for the learning rate whose initialisation is set to 1e 4. To mitigate overﬁtting we apply dropout with rate set to 0.1. Early stopping is used when the validation loss does not decrease more than 1e 3.