BayesDAG: Gradient-Based Posterior Inference for Causal Discovery

Authors: Yashas Annadani, Nick Pawlowski, Joel Jennings, Stefan Bauer, Cheng Zhang, Wenbo Gong

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation on synthetic and real-world datasets demonstrate our approach s effectiveness compared to state-of-the-art baselines. We also demonstrate that our method can be easily scaled to 100 variables with nonlinear relationships.
Researcher Affiliation Collaboration Yashas Annadani 1,3,4 Nick Pawlowski2 Joel Jennings2 Stefan Bauer3,4 Cheng Zhang2 Wenbo Gong*2 1 KTH Royal Institute of Technology, Stockholm 2 Microsoft Research 3 Helmholtz AI, Munich 4 TU Munich
Pseudocode Yes Algorithm 1 Bayes DAG SG-MCMC+VI Inference; Algorithm 2 Joint inference
Open Source Code No The paper provides links and descriptions for the code of *baseline* methods in Appendix F ('Code and License'). However, it does not explicitly state that the source code for *their own* method (Bayes DAG) is available or provide a link for it.
Open Datasets Yes Following previous work, we generate data by randomly sampling DAGs from Erdos Rènyi (ER) [19] or Scale-Free (SF) [5] graphs with per node degree 2 and drawing at random ground truth parameters for linear or nonlinear models. We also evaluate on a real dataset which measures the expression level of different proteins and phospholipids in human cells (called the Sachs Protein Cells Dataset) [59]. Additionally, 'Syn TRe N simulator [67]'. For the Syntren [67] and Sachs Protein Cells [59] datasets, we use the data provided with repository https://github.com/kurowasan/Gra N-DAG (MIT license).
Dataset Splits Yes For d = 5 linear case, we sample at random N = 500 samples from the SCM for training and N = 100 for held-out evaluation. For higher dimensional settings, we consider N = 5000 random samples for training and N = 1000 samples for held-out evaluation. We employ a cross-validation-like procedure for hyperparameter tuning in Bayes DAG and DIBS to optimize MMD true posterior (for d = 5 linear setting) and E SHD value (for nonlinear setting). For each ER and SF dataset with varying dimensions, we initially generate five tuning datasets.
Hardware Specification Yes Table 5: Walltime results (in minutes, rounded to the nearest minute) of the runtime of different approaches on a single 40GB A100 NVIDIA GPU. The N/A fields indicate that the corresponding method cannot be run within the memory constraints of a single GPU.
Software Dependencies No The paper mentions using Multi-Layer Perceptrons (MLP) with Re LU nonlinearity, Leaky Re LU as activation, and a Gumbel-softmax trick. However, it does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or other dependencies.
Experiment Setup Yes For Bayes DAG, we run 10 parallel SG-MCMC chains for p and Θ. We set the sinkhorn temperature t to be 0.2. For the reparametrization of W matrix with Gumbel-softmax trick, we use temperature 0.2. We use 0.0003 for SG-MCMC learning rate l and batch size 512. We run 700 epochs to make sure the model is fully converged. Table 3 shows the hyperparameter selection for Bayes DAG for each setting. For DIBS, we use 0.1 for Gumbel-softmax temperature. We run 10000 epochs for convergence. The learning rate is selected as 0.01.