A Ladder of Causal Distances
Authors: Maxime Peyrard, Robert West
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We put our causal distances to use by benchmarking standard causal discovery systems on both synthetic and real-world datasets for which ground-truth causal models are available. ... Then, we study their behavior in a series of experiments and put them to use in evaluating existing causal discovery systems (Sec. 6). ... We now conduct experiments with the causal distances, on both synthetic and real-world causal models. ... In Fig. 5, we report the evaluation results broken down by model parametrization. |
| Researcher Affiliation | Academia | Maxime Peyrard and Robert West EPFL {maxime.peyrard, robert.west}@epfl.ch |
| Pseudocode | No | The paper mentions describing implementation details in an appendix ('In Appendix D, we describe the practical details related to the implementation and efficient estimation of the causal distances.'), but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block within the provided text. |
| Open Source Code | Yes | Code for reproducing our experiments1 and an extended version of the paper (with added appendices)2 are available online. 1https://github.com/epfl-dlab/ causal-distances 2 https://arxiv.org/abs/2005.02480 |
| Open Datasets | Yes | We considered the following real-world Bayesian causal models from Elidan [2001]: Cancer1, Cancer2, Earthquake, Survey, Protein, Child, and Insurance. ... For synthetic models, we sample random DAGs and random parametrizations using the CDT tool [Kalainathan and Goudet, 2019]. |
| Dataset Splits | No | The paper mentions sampling '2,000 observations' for each model and refers to 'training data' for parameter estimation, but it does not specify explicit train/validation/test splits (e.g., percentages, counts, or predefined splits) for these observations or for validation purposes. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud computing instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'the sample Wasserstein distance', the 'CDT tool', and the 'Pomegranate python framework' but does not specify any version numbers for these software components, which is required for reproducibility. |
| Experiment Setup | Yes | In all experiments, we use the sample Wasserstein distance [Villani, 2008] as the underlying distance D between probability distributions (cf. Eq. 7). ... We construct five models of each type with β = 0.1,0.5,1,2,5, respectively, resulting in 20 models overall. ... For each model, we sample 2,000 observations from which causal discovery methods should recover the causal model. ... When only a partial DAG is returned, we use the edge orientation that provides the best goodness of fit after the parameters have been estimated. |