Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Order-Independent Constraint-Based Causal Structure Learning
Authors: Diego Colombo, Marloes H. Maathuis
JMLR 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare the PC-, FCI-, and RFCI-algorithms and their modifications in simulation studies and on a yeast gene expression data set. We show that our modifications yield similar performance in low-dimensional settings and improved performance in high-dimensional settings. |
| Researcher Affiliation | Academia | Diego Colombo EMAIL Marloes H. Maathuis EMAIL Seminar for Statistics ETH Zurich 8092 Zurich, Switzerland |
| Pseudocode | Yes | Algorithm 3.1 The PC-algorithm (oracle version) Require: Conditional independence information among all variables in V, and an ordering order(V) on the variables; Algorithm 3.2 Adjacency search / Step 1 of the PC-algorithm (oracle version); Algorithm 4.1 Step 1 of the PC-stable algorithm (oracle version) |
| Open Source Code | Yes | All software is implemented in the R-package pcalg (Kalisch et al., 2012). |
| Open Datasets | Yes | In particular, we analyzed the yeast gene expression data set of Hughes et al. (2000). |
| Dataset Splits | No | The paper describes the generation of data for simulation studies (e.g., 'We generated 250 random weighted DAGs with p = 1000 and E(N) = 2, and for each weighted DAG we generated an i.i.d. sample of size n = 50') and the characteristics of the yeast gene expression data ('The observational data consist of gene expression levels of 5361 genes for 63 wild-type yeast organisms, and the experimental data consist of gene expression levels of the same 5361 genes for 234 single-gene knockout strains'). However, it does not explicitly provide information about how these datasets were split into training, testing, or validation sets for their experiments, or cite standard splits. |
| Hardware Specification | Yes | Run time in seconds (computed on an AMD Opteron(tm) Processor 6174 using R 2.15.1.) |
| Software Dependencies | Yes | Run time in seconds (computed on an AMD Opteron(tm) Processor 6174 using R 2.15.1.) of PC and PC-stable for the high-dimensional setting with p = 1000 and n = 50. All software is implemented in the R-package pcalg (Kalisch et al., 2012). |
| Experiment Setup | Yes | We estimated each graph for 20 random variable orderings, using the sample versions of (L)PC(-stable), (L)CPC(-stable), and (L)MPC(-stable) in the setting without latents, and the sample versions of RFCI(-stable), CRFCI(-stable), and MRFCI(-stable) in the setting with latents, with tuning parameter α {0.000625, 0.00125, 0.0025, 0.005, 0.01, 0.02, 0.04}. |