reproducibilityindex.ai

Interventions, Where and How? Experimental Design for Causal Models at Scale

Authors: Panagiotis Tigas, Yashas Annadani, Andrew Jesson, Bernhard Schölkopf, Yarin Gal, Stefan Bauer

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the performance of the proposed method on synthetic graphs (Erdos-Rènyi, Scale Free) for both linear and nonlinear SCMs as well as on the in-silico single-cell gene regulatory network dataset, DREAM. ... We show that our methods, Greedy-CBED and Soft-CBED, perform better than the state-of-the-art active causal discovery baselines in linear and nonlinear SCM settings. In addition, our approach achieves superior results in the real-world inspired nonlinear dataset, DREAM (Greenﬁeld et al., 2010).
Researcher Affiliation	Academia	1OATML, University of Oxford 2KTH Royal Institute of Technology, Stockholm 3Max Planck Institute for Intelligent Systems 4CIFAR Azrieli Global Scholar
Pseudocode	Yes	Algorithm 1: Greedy-CBED ... Algorithm 2: Soft-CBED
Open Source Code	Yes	Implementation available at: https://github.com/yannadani/cbed
Open Datasets	Yes	In this setting, we generate Erd os Rényi (Erd os and Rényi, 1959) (ER) and Scale-Free (SF) graphs (Barabási and Albert, 1999) of size 20 and 50. ... We use Gene Net Weaver (Schaffter et al., 2011) to simulate the steady-state wind-type expression and single-gene knockouts. ... The DREAM family of benchmarks (Greenﬁeld et al., 2010) are designed to evaluate causal discovery algorithms of the regulatory networks of a single cell.
Dataset Splits	No	The paper describes synthetic data generation and the use of the DREAM dataset. While it mentions the number of observational samples and interventional samples, it does not specify traditional train/validation/test splits (e.g., percentages or exact counts) for the experimental setup, as it employs an active learning setting where data is acquired sequentially rather than partitioned from a static dataset.
Hardware Specification	Yes	Experiments are performed using an AMD EPYC 7662 64-Core CPU and Tesla V100 GPU. ... All experiments were run on an AMD EPYC 7662 64-Core CPU and Tesla V100 GPU for computing the results presented in the paper.
Software Dependencies	No	The paper mentions 'PyTorch' and refers to models like 'Di BS' and 'Gene Net Weaver', but it does not specify exact version numbers for these or other software libraries (e.g., Python, NumPy, SciPy) required for replication.
Experiment Setup	Yes	For the nonlinear SCM, we parameterize each variable to be a Gaussian whose mean is a nonlinear function of its parents. We model the nonlinear function with a neural network. In all settings, we set noise variance σ2 = 0.1. For both types of graphs, we set the expected number of edges per vertex to 1. ... Refer to appendix D.2 for the exact settings. ... Appendix H: Hyperparameters provides specific values for various parameters used in the experiments including learning rates, epochs, and batch sizes.