Towards Scalable Bayesian Learning of Causal DAGs
Authors: Jussi Viinikka, Antti Hyttinen, Johan Pensar, Mikko Koivisto
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments demonstrate the performance of our methods in detecting ancestor descendant relations, and in causal effect estimation our Bayesian method is shown to outperform previous approaches. |
| Researcher Affiliation | Academia | Jussi Viinikka Department of Computer Science University of Helsinki jussi.viinikka@helsinki.fi Antti Hyttinen HIIT & Departiment of Computer Science University of Helsinki antti.hyttinen@helsinki.fi Johan Pensar Department of Mathematics University of Oslo johanpen@math.uio.no Mikko Koivisto Department of Computer Science University of Helsinki mikko.koivisto@helsinki.fi |
| Pseudocode | Yes | Algorithm 1 The Gadget method for sampling DAGs; Algorithm 2 The Beeps method for sampling from the posterior of linear causal effects. |
| Open Source Code | Yes | We provide a Python interface for both algorithms, with many time critical parts implemented in C++. For source code see https://www.cs.helsinki.fi/group/sop/gadget-beeps. |
| Open Datasets | Yes | We also ran the algorithms on 8 data sets obtained from the UCI machine learning repository [6], with up to 23 variables, using available preprocessed sets [25]... Finally, we obtained 50 datasets with 100 1600 data points from a benchmark Gaussian BN on gene expressions of Arabidopsis thaliana with n = 107 nodes [34, 30]. |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, or test dataset splits. It mentions generating or obtaining data but no specific split percentages or counts. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., CPU, GPU models) used to run the experiments. |
| Software Dependencies | No | The paper mentions 'Python interface' and 'C++', and refers to 'standard software [33, 13, 12, 1]' (e.g., 'bnlearn R package'), but does not provide specific version numbers for any software components. |
| Experiment Setup | Yes | We set n to 20 to enable exact evaluation of the achieved coverage and comparison to the best possible performance (Opt, cf. Prop. 2). We sampled two data sets of size N = 50 and N = 200 from each of 100 synthetic linear Gaussian DAGs, generated so that the expected neighborhood size was 4, the edge coefficients and the variances of the disturbances uniformly distributed on [0.1, 2] and [0.5, 2], respectively... We ran M = 16 shorter heated chains in parallel... selecting K = 6, 9, 12 candidate parents. |