Detecting hidden confounding in observational data using multiple environments

Authors: Rickard Karlsson, Jesse Krijthe

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Additionally, we propose a procedure to test these independencies and study its empirical finite-sample behavior using simulation studies and semi-synthetic data based on a real-world dataset.
Researcher Affiliation Academia Rickard K.A. Karlsson Department of Intelligent Systems Delft University of Technology The Netherlands r.k.a.karlsson@tudelft.nl Jesse H. Krijthe Department of Intelligent Systems Delft University of Technology The Netherlands j.h.krijthe@tudelft.nl
Pseudocode Yes Algorithm 1: Algorithm for statistically testing the presence of hidden confounding
Open Source Code Yes Code available at github.com/RickardKarl/detect-hidden-confounding.
Open Datasets Yes We use data from twin births in the USA between 1989-1991 [Almond et al., 2005, Louizos et al., 2017] to construct an observational dataset with a known causal structure.
Dataset Splits No The paper describes generating synthetic and semi-synthetic data for its experiments but does not provide explicit train/validation/test dataset splits, percentages, or references to predefined splits.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU models, or cloud instance types.
Software Dependencies No The paper mentions statistical tests and packages like 'dagitty package in R' but does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes For the synthetic data experiments, we generate data as follows: we have the confounder U (k) i Normal(Θ(k) U , 1); treatment T (k) i Ber(Sigm(U (k) i + Θ(k) T )); and outcome Y (k) i Ber(Sigm(λU (k) i + T (k) i + Θ(k) Y )) . Note Sigm(x) = 1/(1 + e x) is the logistic function and Θ(k) V Normal(0, σ2 ΘV ) for V {T, Y, U}. Unless otherwise stated, we use σΘT = σΘU = σΘY = 1. We control the strength of confounding by varying λ, where λ = 0 corresponds to no confounding. Unless otherwise stated, each experiment is repeated 50 times where we use a significance level α = 0.05. Depending on the variable types in the experiment, we state what suitable conditional independence testing method is used by our algorithm. For the following experiments, we use the Kernel Conditional Independence Test [Zhang et al., 2012] in our algorithm due to having continuous variables and, unless otherwise stated, combine 50 hypothesis tests using Fisher s method.