BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery

Authors: Chris Cundy, Aditya Grover, Stefano Ermon

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate BCD Nets for causal discovery on a range of synthetic and real-world data benchmarks. Our experiments demonstrate that with finite datasets, there is considerable uncertainty in the inferred posterior over DAGs.
Researcher Affiliation Collaboration Chris Cundy1 Aditya Grover2,3 Stefano Ermon1 1Department of Computer Science, Stanford University 2Facebook AI Research 3University of California, Los Angeles
Pseudocode Yes Algorithm 1: Bayesian Causal Discovery Nets (BCD Nets)
Open Source Code Yes 1Code is available at github.com/ermongroup/BCD-Nets
Open Datasets Yes We also evaluate on a benchmark protein signalling dataset [44].
Dataset Splits No The paper mentions 'cross-validation' for tuning hyperparameters of a baseline model (GOLEM), but does not specify train/validation/test splits for its own experimental setup.
Hardware Specification Yes All the methods were run on the same hardware, a single Nvidia 2080Ti GPU with 16 CPUs.
Software Dependencies No The paper mentions various algorithms and models (e.g., Gumbel-Sinkhorn, Sinkhorn algorithm, AdaBelief optimizer, PyTorch) but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Incorporating all the above design choices, we obtain our overall algorithm for BCD Nets. The pseudocode is shown in Algorithm 1. Here, qφ(L, Σ) is parameterized as either a normal distribution or a normalizing flow depending on the modeling assumption of equal or non-equal variances. The distribution qφ(P|L, Σ) is a Gumbel-Sinkhorn relaxation of the Gumbel-Matching distribution over permutations, parameterized by a function hφ conditioned on L, Σ. For the function hφ, we use a simple two-layer multi-layer perceptron. For stochastic optimization w.r.t. this distribution, we additionally find it useful to use the straight-through gradient estimator [4]. This means that on the forward pass of the backpropogation algorithm, we obtain the τ 0 limiting value of the Sinkhorn relaxation using the Hungarian algorithm [27], giving a hard permutation P. On the backward pass the gradients are taken with respect to the finite-τ doubly-stochastic matrix P. The prior is given by p(P, L, Σ) = p(P)p(L)p(Σ) where p(L) is a horseshoe prior, p(P) is a uniform prior over permutations and p(Σ) is a relatively uninformative Gaussian prior on log Σ.