reproducibilityindex.ai

BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery

Authors: Chris Cundy, Aditya Grover, Stefano Ermon

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate BCD Nets for causal discovery on a range of synthetic and real-world data benchmarks. Our experiments demonstrate that with ﬁnite datasets, there is considerable uncertainty in the inferred posterior over DAGs.
Researcher Affiliation	Collaboration	Chris Cundy1 Aditya Grover2,3 Stefano Ermon1 1Department of Computer Science, Stanford University 2Facebook AI Research 3University of California, Los Angeles
Pseudocode	Yes	Algorithm 1: Bayesian Causal Discovery Nets (BCD Nets)
Open Source Code	Yes	1Code is available at github.com/ermongroup/BCD-Nets
Open Datasets	Yes	We also evaluate on a benchmark protein signalling dataset [44].
Dataset Splits	No	The paper mentions 'cross-validation' for tuning hyperparameters of a baseline model (GOLEM), but does not specify train/validation/test splits for its own experimental setup.
Hardware Specification	Yes	All the methods were run on the same hardware, a single Nvidia 2080Ti GPU with 16 CPUs.
Software Dependencies	No	The paper mentions various algorithms and models (e.g., Gumbel-Sinkhorn, Sinkhorn algorithm, AdaBelief optimizer, PyTorch) but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Incorporating all the above design choices, we obtain our overall algorithm for BCD Nets. The pseudocode is shown in Algorithm 1. Here, qφ(L, Σ) is parameterized as either a normal distribution or a normalizing ﬂow depending on the modeling assumption of equal or non-equal variances. The distribution qφ(P\|L, Σ) is a Gumbel-Sinkhorn relaxation of the Gumbel-Matching distribution over permutations, parameterized by a function hφ conditioned on L, Σ. For the function hφ, we use a simple two-layer multi-layer perceptron. For stochastic optimization w.r.t. this distribution, we additionally ﬁnd it useful to use the straight-through gradient estimator [4]. This means that on the forward pass of the backpropogation algorithm, we obtain the τ 0 limiting value of the Sinkhorn relaxation using the Hungarian algorithm [27], giving a hard permutation P. On the backward pass the gradients are taken with respect to the ﬁnite-τ doubly-stochastic matrix P. The prior is given by p(P, L, Σ) = p(P)p(L)p(Σ) where p(L) is a horseshoe prior, p(P) is a uniform prior over permutations and p(Σ) is a relatively uninformative Gaussian prior on log Σ.