BayesDAG: Gradient-Based Posterior Inference for Causal Discovery
Authors: Yashas Annadani, Nick Pawlowski, Joel Jennings, Stefan Bauer, Cheng Zhang, Wenbo Gong
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation on synthetic and real-world datasets demonstrate our approach s effectiveness compared to state-of-the-art baselines. We also demonstrate that our method can be easily scaled to 100 variables with nonlinear relationships. |
| Researcher Affiliation | Collaboration | Yashas Annadani 1,3,4 Nick Pawlowski2 Joel Jennings2 Stefan Bauer3,4 Cheng Zhang2 Wenbo Gong*2 1 KTH Royal Institute of Technology, Stockholm 2 Microsoft Research 3 Helmholtz AI, Munich 4 TU Munich |
| Pseudocode | Yes | Algorithm 1 Bayes DAG SG-MCMC+VI Inference; Algorithm 2 Joint inference |
| Open Source Code | No | The paper provides links and descriptions for the code of *baseline* methods in Appendix F ('Code and License'). However, it does not explicitly state that the source code for *their own* method (Bayes DAG) is available or provide a link for it. |
| Open Datasets | Yes | Following previous work, we generate data by randomly sampling DAGs from Erdos Rènyi (ER) [19] or Scale-Free (SF) [5] graphs with per node degree 2 and drawing at random ground truth parameters for linear or nonlinear models. We also evaluate on a real dataset which measures the expression level of different proteins and phospholipids in human cells (called the Sachs Protein Cells Dataset) [59]. Additionally, 'Syn TRe N simulator [67]'. For the Syntren [67] and Sachs Protein Cells [59] datasets, we use the data provided with repository https://github.com/kurowasan/Gra N-DAG (MIT license). |
| Dataset Splits | Yes | For d = 5 linear case, we sample at random N = 500 samples from the SCM for training and N = 100 for held-out evaluation. For higher dimensional settings, we consider N = 5000 random samples for training and N = 1000 samples for held-out evaluation. We employ a cross-validation-like procedure for hyperparameter tuning in Bayes DAG and DIBS to optimize MMD true posterior (for d = 5 linear setting) and E SHD value (for nonlinear setting). For each ER and SF dataset with varying dimensions, we initially generate five tuning datasets. |
| Hardware Specification | Yes | Table 5: Walltime results (in minutes, rounded to the nearest minute) of the runtime of different approaches on a single 40GB A100 NVIDIA GPU. The N/A fields indicate that the corresponding method cannot be run within the memory constraints of a single GPU. |
| Software Dependencies | No | The paper mentions using Multi-Layer Perceptrons (MLP) with Re LU nonlinearity, Leaky Re LU as activation, and a Gumbel-softmax trick. However, it does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or other dependencies. |
| Experiment Setup | Yes | For Bayes DAG, we run 10 parallel SG-MCMC chains for p and Θ. We set the sinkhorn temperature t to be 0.2. For the reparametrization of W matrix with Gumbel-softmax trick, we use temperature 0.2. We use 0.0003 for SG-MCMC learning rate l and batch size 512. We run 700 epochs to make sure the model is fully converged. Table 3 shows the hyperparameter selection for Bayes DAG for each setting. For DIBS, we use 0.1 for Gumbel-softmax temperature. We run 10000 epochs for convergence. The learning rate is selected as 0.01. |