Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Causal Discovery over Clusters of Variables in Markovian Systems

Authors: Tara Anand, Adèle Ribeiro, Jin Tian, George Hripcsak, Elias Bareinboim

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments We show performance of CLOC in comparison to a PC-then-Cluster approach, where PC over the entire set of variables is run with variables then grouped by the partition, yielding a graph over clusters. We generate random C-DAGs (3, 5, 6, 7, 8 clusters), and random DAGs (4, 8, 32, 64, 128, 256 variables) compatible with the C-DAGs. A Gaussian distribution (1000, 3000, 10000, 30000 samples) faithful to the DAG is drawn, over which PC-then-Cluster and CLOC are run. Runtime, conditional independence test counts, and structural hamming distances between the resulting graphs of each method and the true C-DAG are shown in Figure 5.
Researcher Affiliation	Academia	Tara V Anand Department of Biomedical Informatics Columbia University EMAIL Adèle H Ribeiro Institute of Medical Informatics University of Münster EMAIL Jin Tian Mohamed bin Zayed University of Artificial Intelligence EMAIL George Hripcsak Department of Biomedical Informatics Columbia University EMAIL Elias Bareinboim Causal Artificial Intelligence Laboratory Columbia University EMAIL
Pseudocode	Yes	Algorithm 1: CLOC: Algorithm for Learning an αC-CPDAG Algorithm 2: CLOC: Adjacencies and Independence Arcs Algorithm 3: CLOC: Separation and Connection Marks
Open Source Code	Yes	Algorithms are implemented in Python and implementation of CLOC and experiments are available at: https://github.com/Tara Anand/CLOC
Open Datasets	No	We generate random C-DAGs (3, 5, 6, 7, 8 clusters), and random DAGs (4, 8, 32, 64, 128, 256 variables) compatible with the C-DAGs. A Gaussian distribution (1000, 3000, 10000, 30000 samples) faithful to the DAG is drawn, over which PC-then-Cluster and CLOC are run.
Dataset Splits	No	The paper describes generating synthetic data (random C-DAGs and DAGs, Gaussian distribution with varying samples) for its experiments, but does not involve standard dataset splits like training, validation, or test sets in the context of model training.
Hardware Specification	Yes	All experiments were run on a machine with CPU: Intel i9 Chip, 32 GB of RAM, and mac OS operating system. A single core was used for the experiments.
Software Dependencies	No	Algorithms are implemented in Python and implementation of CLOC and experiments are available at: https://github.com/Tara Anand/CLOC. In our implementation of CLOC the multi-variate conditional independence test used iterates over pair-wise tests of variable level independence tests with early stopping when a dependence is determined implying dependence over clusters. For the latter method, we use the built-in implementation of PC in the python package causal-learn [27].
Experiment Setup	Yes	4 Experiments We show performance of CLOC in comparison to a PC-then-Cluster approach, where PC over the entire set of variables is run with variables then grouped by the partition, yielding a graph over clusters. We generate random C-DAGs (3, 5, 6, 7, 8 clusters), and random DAGs (4, 8, 32, 64, 128, 256 variables) compatible with the C-DAGs. A Gaussian distribution (1000, 3000, 10000, 30000 samples) faithful to the DAG is drawn, over which PC-then-Cluster and CLOC are run. Runtime, conditional independence test counts, and structural hamming distances between the resulting graphs of each method and the true C-DAG are shown in Figure 5. D.1 Experimental Setup All experiments were run on a machine with CPU: Intel i9 Chip, 32 GB of RAM, and mac OS operating system. A single core was used for the experiments. Algorithms are implemented in Python and implementation of CLOC and experiments are available at: https://github.com/Tara Anand/CLOC In our simulations, we compare two approaches to developing a clustered graphical equivalence class. The first approach consists of applying PC to the distribution over variables, P(V), and then imposing clusters. The clustering procedure is shown below. ... In our implementation of CLOC the multi-variate conditional independence test used iterates over pair-wise tests of variable level independence tests with early stopping when a dependence is determined implying dependence over clusters.