Causal Modelling Agents: Causal Graph Discovery through Synergising Metadata- and Data-driven Reasoning

Authors: Ahmed Abdulaal, adamos hadjivasiliou, Nina Montana-Brown, Tiantian He, Ayodeji Ijishakin, Ivana Drobnjak, Daniel C. Castro, Daniel C. Alexander

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the CMA s performance on a number of benchmarks, as well as on the real-world task of modelling the clinical and radiological phenotype of Alzheimer s Disease (AD). Our experimental results indicate that the CMA can outperform previous purely data-driven or metadata-driven approaches to causal discovery. In our real-world application, we use the CMA to derive new insights into the causal relationships among biomarkers of AD. and 4 EXPERIMENTS
Researcher Affiliation Collaboration 1Centre for Medical Image Computing, UCL, London, United Kingdom 2Microsoft Research, Cambridge
Pseudocode Yes Algorithm 1 Iterative procedure of the CMA Framework, Algorithm 2 Global Hypothesis Amendment, Algorithm 3 Local Hypothesis Precomputation Phase, Algorithm 4 Local Hypothesis Amendment, Algorithm 5 Post-processing and memory generation
Open Source Code Yes To increase reproducibility, we have included all implementation details in Appendix A.1. We also include implementation and prompting code at https://anonymous.4open.science/r/causal_modelling_agent-F443/.
Open Datasets Yes The Arctic sea ice dataset (Huang et al., 2021b) is from the field of atmospheric science and is an increasingly popular dataset for the task of full causal graph discovery (Kıcıman et al., 2023). This dataset considers the relations of several geophysical variables to sea ice thick- Table 4: Description of variables in Arctic Sea Ice dataset, The Sangiovese dataset is from the field of agricultural science and is a conditional linear Gaussian Bayesian Network from the popular bnlearn R package (Magrini et al., 2017). Table 8: Description of variables in Sangiovese dataset, The Alzheimer s dataset is another conditional linear Gaussian Bayesian Network that we developed in collaboration with 5 domain experts.
Dataset Splits No The paper describes training procedures and data characteristics but does not explicitly state training/validation/test splits, specific percentages, or how cross-validation was performed for the main experiments.
Hardware Specification Yes Experiments were parallelized across two NVIDIA RTX 3090 GPUs and one NVIDIA RTX 4090 GPU.
Software Dependencies No The paper mentions several software packages (e.g., Ni Learn, ANTs, HD-BET, N4) and optimizers (Adam W) but does not specify their version numbers or other key software dependencies with versions.
Experiment Setup Yes All learnable flow parameters were optimized by maximizing the likelihood using the Adam W optimizer (You et al., 2019) with a learning rate of 3 10 3 for 300 epochs. and All learnable parameters in the flows and the CVAE architecture were optimised by a stochastic variational inference approach to estimate the evidence lower bound (ELBO; estimated using 4 Monte Carlo (MC) samples) using the Adam optimizer (Kingma & Ba, 2015) with learning rates of 10 5 and 5 10 3, respectively. For counterfactual inference, 32 MC samples were taken and the inference result was their average.