Discovering Fully Oriented Causal Networks
Authors: Osman A Mian, Alexander Marx, Jilles Vreeken8975-8982
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through an extensive set of experiments we show GLOBE performs very well in practice, beating the state-of-the-art by a margin. |
| Researcher Affiliation | Academia | Osman Mian, Alexander Marx and Jilles Vreeken CISPA Helmholtz Center for Information Security {osman.mian, alexander.marx, jv}@cispa.de |
| Pseudocode | Yes | Algorithm 1: The GLOBE Algorithm |
| Open Source Code | Yes | For reproducibility we provide detailed pseudo-code in technical appendix, and make all code and data available. GLOBE is implemented in Python and both the source code, as well as the synthetic data are made available for reproducibility.3 (Footnote 3: http://eda.mmci.uni-saarland.de/globe/) |
| Open Datasets | Yes | We evaluate GLOBE on both synthetic and real-world data with known ground truth. For reproducibility we provide detailed pseudo-code in technical appendix, and make all code and data available. (for synthetic data) For real world data with known ground truth, we consider three distinct networks of sizes 5, 15 and 500 nodes from the reged dataset (Statnikov et al. 2015) (for real-world data). |
| Dataset Splits | No | The paper states the number of observations for synthetic data ('100 instances each with 1 000 observations') and rows for real-world data ('1 000 rows') but does not specify any explicit training, validation, or testing splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions using 'the open-source implementation in R of Multivariate Adaptive Regression Splines framework (Friedman 1991)' but does not specify version numbers for R or the MARS framework. |
| Experiment Setup | Yes | Due to computational reasons, we only traverse the space of DAGs and not the Markov equivalence classes, which could result in a locally optimal solution. We try to mitigate this using the edge flipping step during the forward search. However, by incorporating a more complex search strategy, like the beam search, we could both expand our search space, and eliminate the need for the edge flip. Our score is specifically defined for continuous valued data. An extension of GLOBE would be to discover causal relationships over discrete and mixed type data. As MDL-based scores have been proposed for inference on discrete (Budhathoki and Vreeken 2017) and mixed (Marx and Vreeken 2018) data, but only for pairs of variables, it would be interesting to extend GLOBE to handle both cases. We instantiate GLOBE 2 using the open-source implementation in R of Multivariate Adaptive Regression Splines framework (Friedman 1991). Since we could face issues like multicollinearity (Farrar and Glauber 1967) and unrealistic run times if we allow for arbitrary many interactions between parents, we restrict the maximum number of interaction terms to 2 for experiments. |