Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Causal Discovery Toolbox: Uncovering causal relationships in Python
Authors: Diviyan Kalainathan, Olivier Goudet, Ritik Dutta
JMLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Fig. 2 compares the runtimes of the two PC implementations on synthetic graphs with of varying size, connectivity, and number of data points, showing a constant gap in with respect to the number of data points and connectivity of the graph. The Cdt package implements an end-to-end approach, recovering the direct dependencies (the skeleton of the causal graph) and the causal relationships between variables. It includes algorithms from the Bnlearn (Scutari, 2018) and Pcalg (Kalisch et al., 2018) packages, together with algorithms for pairwise causal discovery such as ANM (Hoyer et al., 2009). |
| Researcher Affiliation | Collaboration | Diviyan Kalainathan EMAIL Fen Tech, TAU, LRI, INRIA, Universit e Paris-Sud 20 Rue Raymond Aron, 75013 Paris, France; Olivier Goudet EMAIL LERIA, Universit e d Angers, 2 boulevard Lavoisier, 49045 Angers, France; Ritik Dutta EMAIL IIT Gandhinagar, Gandhinagar, Gujarat 382355, India |
| Pseudocode | No | The paper describes algorithms and a pipeline but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Cdt is available under the MIT License at https://github.com/Fen Tech Solutions/Causal Discovery Toolbox. |
| Open Datasets | Yes | by including scoring metrics, and standard benchmark data sets such as the Sachs data set (Sachs et al., 2005). |
| Dataset Splits | No | The paper uses synthetic graphs for runtime comparisons and mentions the Sachs dataset but does not provide specific details on dataset splits for reproduction. |
| Hardware Specification | No | Cdt includes many state-of-the-art causal modeling algorithms (some of which are imported from R), that supports GPU hardware acceleration and automatic hardware detection. This only mentions general "GPU hardware acceleration" without specific hardware models. |
| Software Dependencies | No | The paper mentions that the Cdt package integrates R and Python scripts, and refers to future plans to reimplement R algorithms in Python Numba and Pytorch algorithms in Chainer. It also mentions using Bnlearn and Pcalg (R packages). However, it does not provide specific version numbers for Python, R, or other key libraries used in the current methodology. |
| Experiment Setup | No | The paper describes a software toolbox and evaluates the runtime of one algorithm (PC) on synthetic graphs by varying size, connectivity, and number of data points, but does not provide specific experimental setup details such as hyperparameters or training configurations for models. |