Causal Modelling Agents: Causal Graph Discovery through Synergising Metadata- and Data-driven Reasoning
Authors: Ahmed Abdulaal, adamos hadjivasiliou, Nina Montana-Brown, Tiantian He, Ayodeji Ijishakin, Ivana Drobnjak, Daniel C. Castro, Daniel C. Alexander
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the CMA s performance on a number of benchmarks, as well as on the real-world task of modelling the clinical and radiological phenotype of Alzheimer s Disease (AD). Our experimental results indicate that the CMA can outperform previous purely data-driven or metadata-driven approaches to causal discovery. In our real-world application, we use the CMA to derive new insights into the causal relationships among biomarkers of AD. and 4 EXPERIMENTS |
| Researcher Affiliation | Collaboration | 1Centre for Medical Image Computing, UCL, London, United Kingdom 2Microsoft Research, Cambridge |
| Pseudocode | Yes | Algorithm 1 Iterative procedure of the CMA Framework, Algorithm 2 Global Hypothesis Amendment, Algorithm 3 Local Hypothesis Precomputation Phase, Algorithm 4 Local Hypothesis Amendment, Algorithm 5 Post-processing and memory generation |
| Open Source Code | Yes | To increase reproducibility, we have included all implementation details in Appendix A.1. We also include implementation and prompting code at https://anonymous.4open.science/r/causal_modelling_agent-F443/. |
| Open Datasets | Yes | The Arctic sea ice dataset (Huang et al., 2021b) is from the field of atmospheric science and is an increasingly popular dataset for the task of full causal graph discovery (Kıcıman et al., 2023). This dataset considers the relations of several geophysical variables to sea ice thick- Table 4: Description of variables in Arctic Sea Ice dataset, The Sangiovese dataset is from the field of agricultural science and is a conditional linear Gaussian Bayesian Network from the popular bnlearn R package (Magrini et al., 2017). Table 8: Description of variables in Sangiovese dataset, The Alzheimer s dataset is another conditional linear Gaussian Bayesian Network that we developed in collaboration with 5 domain experts. |
| Dataset Splits | No | The paper describes training procedures and data characteristics but does not explicitly state training/validation/test splits, specific percentages, or how cross-validation was performed for the main experiments. |
| Hardware Specification | Yes | Experiments were parallelized across two NVIDIA RTX 3090 GPUs and one NVIDIA RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions several software packages (e.g., Ni Learn, ANTs, HD-BET, N4) and optimizers (Adam W) but does not specify their version numbers or other key software dependencies with versions. |
| Experiment Setup | Yes | All learnable flow parameters were optimized by maximizing the likelihood using the Adam W optimizer (You et al., 2019) with a learning rate of 3 10 3 for 300 epochs. and All learnable parameters in the flows and the CVAE architecture were optimised by a stochastic variational inference approach to estimate the evidence lower bound (ELBO; estimated using 4 Monte Carlo (MC) samples) using the Adam optimizer (Kingma & Ba, 2015) with learning rates of 10 5 and 5 10 3, respectively. For counterfactual inference, 32 MC samples were taken and the inference result was their average. |