Causal Discovery from Event Sequences by Local Cause-Effect Attribution

Authors: Joscha Cüppers, Sascha Xu, Ahmed Musa, Jilles Vreeken

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluation shows that CASCADE outperforms existing methods in settings with instantaneous effects, noise, and multiple colliders, and discovers insightful causal graphs on real-world data.
Researcher Affiliation Academia Joscha Cüppers CISPA Helmholtz Center for Information Security joscha.cueppers@cispa.de Sascha Xu CISPA Helmholtz Center for Information Security sascha.xu@cispa.de Ahmed Musa Institute for Structural Analysis TU Dresden ahmed.musa@tu-dresden.de Jilles Vreeken CISPA Helmholtz Center for Information Security jv@cispa.de
Pseudocode No The paper describes the CASCADE algorithm in Section 4, but it does not provide a formal pseudocode block or algorithm figure.
Open Source Code Yes CASCADE is implemented in Python. We provide the source code, along with the synthetic data generator and the used real-world datasets online.3 https://eda.rg.cispa.io/prj/cascade/
Open Datasets Yes We evaluate CASCADE on three distinct datasets of real-world event sequences. We begin by evaluating CASCADE on a dataset of network alarms, where the causal structure is known. This data was provided by Huawei for the Neur IPS 2023 CSL-competition4 and consists of data from a simulated network of devices in which alarms can cause other alarms. We run all methods and get an (unnormalized) SHD score of 42 for CASCADE, 127 for THP, 214 for NPHC, and 1564 for CAUSE. As the network connectivity structure is known, we can take it into account during the search. THP supports this natively, CASCADE can be trivially constrained to only consider the given edges. CASCADE correctly identifies 142 out of 147 causal edges, THP 20. Neither method reports spurious edges. We show the full recovered graph in Appendix B.5. Second, we run CASCADE on a daily return volatility dataset [42], we follow the preprocessing of Jalaldoust et al. [39], specifically we turn the time series into an event sequence by rolling a one year window over the data and register an event if the last value is among the top 10%. The dataset includes the 96 world s largest publicly traded banks. We show the largest discovered subgraph in Fig. 5. In addition, three unconnected sub-graphs are discovered, one covering two banks in Australia and two others connecting banks in Japan, which we provide in Appendix B.5. Daily Activities We run CASCADE on a dataset of recorded daily activities [43]. Our method reports plausible causal connections such as Sleeping End Showering Start Showering End Breakfast Start Breakfast End, etc. We show the complete graph in the Appendix B.5.
Dataset Splits No The paper describes the generation of synthetic data and the use of real-world datasets, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) needed to reproduce the experiments. The paper uses the full dataset for evaluation metrics like SHD, SID, and F1 score.
Hardware Specification Yes All experiments where executed on a internal cluster on compute nodes equipped with a AMD EPYC 7773X 64-Core Processor (2.2 GHz; Turboboost: 3.5 GHz), with 2 TB of RAM, while in practice a fraction of that was necessary.
Software Dependencies No The paper states 'CASCADE is implemented in Python' and mentions generating synthetic data 'using the tick library5', but it does not provide specific version numbers for Python, the tick library, or any other software dependencies needed to reproduce the experiments.
Experiment Setup Yes For all experiments we consider a geometric distribution, which we shift back to cover instant effects. We set the precision parameter for all experiments to 2. For all synthetic experiments we consider events as potential causes of at most 100 timestamps.