A Ladder of Causal Distances

Authors: Maxime Peyrard, Robert West

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We put our causal distances to use by benchmarking standard causal discovery systems on both synthetic and real-world datasets for which ground-truth causal models are available. ... Then, we study their behavior in a series of experiments and put them to use in evaluating existing causal discovery systems (Sec. 6). ... We now conduct experiments with the causal distances, on both synthetic and real-world causal models. ... In Fig. 5, we report the evaluation results broken down by model parametrization.
Researcher Affiliation Academia Maxime Peyrard and Robert West EPFL {maxime.peyrard, robert.west}@epfl.ch
Pseudocode No The paper mentions describing implementation details in an appendix ('In Appendix D, we describe the practical details related to the implementation and efficient estimation of the causal distances.'), but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block within the provided text.
Open Source Code Yes Code for reproducing our experiments1 and an extended version of the paper (with added appendices)2 are available online. 1https://github.com/epfl-dlab/ causal-distances 2 https://arxiv.org/abs/2005.02480
Open Datasets Yes We considered the following real-world Bayesian causal models from Elidan [2001]: Cancer1, Cancer2, Earthquake, Survey, Protein, Child, and Insurance. ... For synthetic models, we sample random DAGs and random parametrizations using the CDT tool [Kalainathan and Goudet, 2019].
Dataset Splits No The paper mentions sampling '2,000 observations' for each model and refers to 'training data' for parameter estimation, but it does not specify explicit train/validation/test splits (e.g., percentages, counts, or predefined splits) for these observations or for validation purposes.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud computing instance types used for running the experiments.
Software Dependencies No The paper mentions using 'the sample Wasserstein distance', the 'CDT tool', and the 'Pomegranate python framework' but does not specify any version numbers for these software components, which is required for reproducibility.
Experiment Setup Yes In all experiments, we use the sample Wasserstein distance [Villani, 2008] as the underlying distance D between probability distributions (cf. Eq. 7). ... We construct five models of each type with β = 0.1,0.5,1,2,5, respectively, resulting in 20 models overall. ... For each model, we sample 2,000 observations from which causal discovery methods should recover the causal model. ... When only a partial DAG is returned, we use the edge orientation that provides the best goodness of fit after the parameters have been estimated.