A Bayesian Approach for Estimating Causal Effects from Observational Data

Authors: Johan Pensar, Topi Talvitie, Antti Hyttinen, Mikko Koivisto5395-5402

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated the performance of our method by three empirical studies. The first study uses simulated data to examine the behaviour of our posterior and, in particular, the accuracy of the resulting estimates as compared to those of different IDA variants. In the second study, we assess the accuracy of our method as well as ancestor relation probabilities in terms of causal effect discovery. The third study demonstrates the applicability of our approach to real-world data.
Researcher Affiliation Academia Johan Pensar Dept. of Math. and Stat. University of Helsinki johan.pensar@helsinki.fi Topi Talvitie Dept. of Computer Science University of Helsinki topi.talvitie@helsinki.fi Antti Hyttinen HIIT & Dept. of CS University of Helsinki antti.hyttinen@helsinki.fi Mikko Koivisto Dept. of Computer Science University of Helsinki mikko.koivisto@helsinki.fi
Pseudocode Yes Algorithm 1 Computing the unnormalized parent set and ancestor relation probabilities. 1: Compute the zeta transform ˆwv of the local weight function wv for each v V . 2: Compute the forward function f and the auxilliary function g using Eqs. (12) and (14). 3: Compute the backward function bi for each i V using Eq. (13). 4: Compute the weight Wi(S) for each i V and S V \{i} using Eq. (10). 5: Compute the weight Wi,j for each pair i, j V using Eq. (11).
Open Source Code Yes The code package and the supplementary material are available at https://github.com/jopensar/BIDA.
Open Datasets Yes Finally, we apply our method on observational flow cytometry data... As an example of a possible application for our method, we consider the flow cytometry data (Sachs et al. 2005).
Dataset Splits No The paper mentions generating data sets with increasing sample sizes (n=50, 200, 800) but does not specify explicit train/validation/test splits for these datasets within the main text.
Hardware Specification Yes For a data set on 20 variables, the computations take about 25 minutes on a modern laptop computer (single thread, Intel Core i7-6600U, 2.60 GHz).
Software Dependencies No The paper states that Algorithm 1 was implemented in C++ and the rest of the method in R, but it does not provide specific version numbers for these languages or any dependent software libraries/packages.
Experiment Setup Yes For calculating the parent set and ancestor relation probabilities, we limited the parent set size to 6 and used the fractional marginal likelihood (Consonni and Rocca 2012), with αΩ = d - 1 and n0 = 1, together with a uniform graph prior. The hyperparameters in the prior for the linear model (5) were set as follows: m0 = 0, Λ0 = I, and a0 = b0 = 1.