Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Causal Modelling Agents: Causal Graph Discovery through Synergising Metadata- and Data-driven Reasoning

Authors: Ahmed Abdulaal, adamos hadjivasiliou, Nina Montana-Brown, Tiantian He, Ayodeji Ijishakin, Ivana Drobnjak, Daniel C. Castro, Daniel C. Alexander

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the CMA s performance on a number of benchmarks, as well as on the real-world task of modelling the clinical and radiological phenotype of Alzheimer s Disease (AD). Our experimental results indicate that the CMA can outperform previous purely data-driven or metadata-driven approaches to causal discovery. In our real-world application, we use the CMA to derive new insights into the causal relationships among biomarkers of AD. and 4 EXPERIMENTS
Researcher Affiliation	Collaboration	1Centre for Medical Image Computing, UCL, London, United Kingdom 2Microsoft Research, Cambridge
Pseudocode	Yes	Algorithm 1 Iterative procedure of the CMA Framework, Algorithm 2 Global Hypothesis Amendment, Algorithm 3 Local Hypothesis Precomputation Phase, Algorithm 4 Local Hypothesis Amendment, Algorithm 5 Post-processing and memory generation
Open Source Code	Yes	To increase reproducibility, we have included all implementation details in Appendix A.1. We also include implementation and prompting code at https://anonymous.4open.science/r/causal_modelling_agent-F443/.
Open Datasets	Yes	The Arctic sea ice dataset (Huang et al., 2021b) is from the field of atmospheric science and is an increasingly popular dataset for the task of full causal graph discovery (Kıcıman et al., 2023). This dataset considers the relations of several geophysical variables to sea ice thick- Table 4: Description of variables in Arctic Sea Ice dataset, The Sangiovese dataset is from the field of agricultural science and is a conditional linear Gaussian Bayesian Network from the popular bnlearn R package (Magrini et al., 2017). Table 8: Description of variables in Sangiovese dataset, The Alzheimer s dataset is another conditional linear Gaussian Bayesian Network that we developed in collaboration with 5 domain experts.
Dataset Splits	No	The paper describes training procedures and data characteristics but does not explicitly state training/validation/test splits, specific percentages, or how cross-validation was performed for the main experiments.
Hardware Specification	Yes	Experiments were parallelized across two NVIDIA RTX 3090 GPUs and one NVIDIA RTX 4090 GPU.
Software Dependencies	No	The paper mentions several software packages (e.g., Ni Learn, ANTs, HD-BET, N4) and optimizers (Adam W) but does not specify their version numbers or other key software dependencies with versions.
Experiment Setup	Yes	All learnable flow parameters were optimized by maximizing the likelihood using the Adam W optimizer (You et al., 2019) with a learning rate of 3 10 3 for 300 epochs. and All learnable parameters in the flows and the CVAE architecture were optimised by a stochastic variational inference approach to estimate the evidence lower bound (ELBO; estimated using 4 Monte Carlo (MC) samples) using the Adam optimizer (Kingma & Ba, 2015) with learning rates of 10 5 and 5 10 3, respectively. For counterfactual inference, 32 MC samples were taken and the inference result was their average.