BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery
Authors: Chris Cundy, Aditya Grover, Stefano Ermon
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate BCD Nets for causal discovery on a range of synthetic and real-world data benchmarks. Our experiments demonstrate that with finite datasets, there is considerable uncertainty in the inferred posterior over DAGs. |
| Researcher Affiliation | Collaboration | Chris Cundy1 Aditya Grover2,3 Stefano Ermon1 1Department of Computer Science, Stanford University 2Facebook AI Research 3University of California, Los Angeles |
| Pseudocode | Yes | Algorithm 1: Bayesian Causal Discovery Nets (BCD Nets) |
| Open Source Code | Yes | 1Code is available at github.com/ermongroup/BCD-Nets |
| Open Datasets | Yes | We also evaluate on a benchmark protein signalling dataset [44]. |
| Dataset Splits | No | The paper mentions 'cross-validation' for tuning hyperparameters of a baseline model (GOLEM), but does not specify train/validation/test splits for its own experimental setup. |
| Hardware Specification | Yes | All the methods were run on the same hardware, a single Nvidia 2080Ti GPU with 16 CPUs. |
| Software Dependencies | No | The paper mentions various algorithms and models (e.g., Gumbel-Sinkhorn, Sinkhorn algorithm, AdaBelief optimizer, PyTorch) but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Incorporating all the above design choices, we obtain our overall algorithm for BCD Nets. The pseudocode is shown in Algorithm 1. Here, qφ(L, Σ) is parameterized as either a normal distribution or a normalizing flow depending on the modeling assumption of equal or non-equal variances. The distribution qφ(P|L, Σ) is a Gumbel-Sinkhorn relaxation of the Gumbel-Matching distribution over permutations, parameterized by a function hφ conditioned on L, Σ. For the function hφ, we use a simple two-layer multi-layer perceptron. For stochastic optimization w.r.t. this distribution, we additionally find it useful to use the straight-through gradient estimator [4]. This means that on the forward pass of the backpropogation algorithm, we obtain the τ 0 limiting value of the Sinkhorn relaxation using the Hungarian algorithm [27], giving a hard permutation P. On the backward pass the gradients are taken with respect to the finite-τ doubly-stochastic matrix P. The prior is given by p(P, L, Σ) = p(P)p(L)p(Σ) where p(L) is a horseshoe prior, p(P) is a uniform prior over permutations and p(Σ) is a relatively uninformative Gaussian prior on log Σ. |