Interventions, Where and How? Experimental Design for Causal Models at Scale

Authors: Panagiotis Tigas, Yashas Annadani, Andrew Jesson, Bernhard Schölkopf, Yarin Gal, Stefan Bauer

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the performance of the proposed method on synthetic graphs (Erdos-Rènyi, Scale Free) for both linear and nonlinear SCMs as well as on the in-silico single-cell gene regulatory network dataset, DREAM. ... We show that our methods, Greedy-CBED and Soft-CBED, perform better than the state-of-the-art active causal discovery baselines in linear and nonlinear SCM settings. In addition, our approach achieves superior results in the real-world inspired nonlinear dataset, DREAM (Greenfield et al., 2010).
Researcher Affiliation Academia 1OATML, University of Oxford 2KTH Royal Institute of Technology, Stockholm 3Max Planck Institute for Intelligent Systems 4CIFAR Azrieli Global Scholar
Pseudocode Yes Algorithm 1: Greedy-CBED ... Algorithm 2: Soft-CBED
Open Source Code Yes Implementation available at: https://github.com/yannadani/cbed
Open Datasets Yes In this setting, we generate Erd os Rényi (Erd os and Rényi, 1959) (ER) and Scale-Free (SF) graphs (Barabási and Albert, 1999) of size 20 and 50. ... We use Gene Net Weaver (Schaffter et al., 2011) to simulate the steady-state wind-type expression and single-gene knockouts. ... The DREAM family of benchmarks (Greenfield et al., 2010) are designed to evaluate causal discovery algorithms of the regulatory networks of a single cell.
Dataset Splits No The paper describes synthetic data generation and the use of the DREAM dataset. While it mentions the number of observational samples and interventional samples, it does not specify traditional train/validation/test splits (e.g., percentages or exact counts) for the experimental setup, as it employs an active learning setting where data is acquired sequentially rather than partitioned from a static dataset.
Hardware Specification Yes Experiments are performed using an AMD EPYC 7662 64-Core CPU and Tesla V100 GPU. ... All experiments were run on an AMD EPYC 7662 64-Core CPU and Tesla V100 GPU for computing the results presented in the paper.
Software Dependencies No The paper mentions 'PyTorch' and refers to models like 'Di BS' and 'Gene Net Weaver', but it does not specify exact version numbers for these or other software libraries (e.g., Python, NumPy, SciPy) required for replication.
Experiment Setup Yes For the nonlinear SCM, we parameterize each variable to be a Gaussian whose mean is a nonlinear function of its parents. We model the nonlinear function with a neural network. In all settings, we set noise variance σ2 = 0.1. For both types of graphs, we set the expected number of edges per vertex to 1. ... Refer to appendix D.2 for the exact settings. ... Appendix H: Hyperparameters provides specific values for various parameters used in the experiments including learning rates, epochs, and batch sizes.