Learning to Induce Causal Structure

Authors: Nan Rosemary Ke, Silvia Chiappa, Jane X Wang, Jorg Bornschein, Anirudh Goyal, Melanie Rey, Theophane Weber, Matthew Botvinick, Michael Curtis Mozer, Danilo Jimenez Rezende

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The learned model generalizes to new synthetic graphs, is robust to train-test distribution shifts, and achieves state-of-the-art performance on naturalistic graphs for low sample complexity. We show that CSIvA significantly outperforms state-of-the-art causal structure induction methods such as DCDI (Brouillard et al., 2020) and ENCO (Lippe et al., 2021) both on various types of synthetic CBNs, as well as on naturalistic CBNs. We report on a series of experiments of increasing challenge to our supervised approach to causal structure induction.
Researcher Affiliation Collaboration 01Deep Mind, 2 Mila, 3 Polytechnique Montreal, 4 University of Montreal, 5 Google Research, Brain Team, Corresponding author: nke@google.com
Pseudocode No The paper describes the model architecture and data generation process in narrative text and figures, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing the source code for the described methodology or a link to a code repository.
Open Datasets Yes we show that CSIvA generalizes well to data from naturalistic CBNs even if trained on synthetic data with relatively few assumptions, where naturalistic CBNs are graphs that correspond to causal relationships that exist in nature, such as graphs from the bnlearn repository (www.bnlearn.com/ bnrepository). We show that our model can learn a mapping from datasets to structures and achieves state-of-the-art performance on classic benchmarks such as the Sachs, Asia and Child datasets (Lauritzen & Spiegelhalter, 1988; Sachs et al., 2005; Spiegelhalter & Cowell, 1992), despite never directly being trained on such data.
Dataset Splits No The paper mentions separate training and testing datasets with specific counts, but does not explicitly provide details about a validation set or specific percentages for train/validation/test splits.
Hardware Specification No The paper states that "All models (baselines and our model) are trained on GPUs" but does not specify any particular GPU models, CPU models, memory, or specific cloud/cluster resources used for the experiments.
Software Dependencies No The paper mentions the use of the Adam optimizer, but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or other dependencies.
Experiment Setup Yes For all of our experiments (unless otherwise stated) our model was trained on I = 15,000 (for graphs N ≤ 20) and on I = 20,000 (for graphs N > 20) pairs {(Di, Ai)}I i=1, where each dataset Di contained S = 1500 observational and interventional samples. The model was trained for 500,000 iterations using the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 1e-4. Also, refer to Table 3 for the list of hyperparameters presented in a table. (Table 3 includes: Hidden state dimension 64, Encoder transformer layers 8, Num. attention heads 8, Optimizer Adam, Learning rate 10^-4, Number of random seeds 3, S (number of samples) 1500, Training iterations 500,000, Num. training datasets I 15,000 (N ≤ 20) 20,000 (N > 20))