reproducibilityindex.ai

Detecting hidden confounding in observational data using multiple environments

Authors: Rickard Karlsson, Jesse Krijthe

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Additionally, we propose a procedure to test these independencies and study its empirical finite-sample behavior using simulation studies and semi-synthetic data based on a real-world dataset.
Researcher Affiliation	Academia	Rickard K.A. Karlsson Department of Intelligent Systems Delft University of Technology The Netherlands r.k.a.karlsson@tudelft.nl Jesse H. Krijthe Department of Intelligent Systems Delft University of Technology The Netherlands j.h.krijthe@tudelft.nl
Pseudocode	Yes	Algorithm 1: Algorithm for statistically testing the presence of hidden confounding
Open Source Code	Yes	Code available at github.com/RickardKarl/detect-hidden-confounding.
Open Datasets	Yes	We use data from twin births in the USA between 1989-1991 [Almond et al., 2005, Louizos et al., 2017] to construct an observational dataset with a known causal structure.
Dataset Splits	No	The paper describes generating synthetic and semi-synthetic data for its experiments but does not provide explicit train/validation/test dataset splits, percentages, or references to predefined splits.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU models, or cloud instance types.
Software Dependencies	No	The paper mentions statistical tests and packages like 'dagitty package in R' but does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup	Yes	For the synthetic data experiments, we generate data as follows: we have the confounder U (k) i Normal(Θ(k) U , 1); treatment T (k) i Ber(Sigm(U (k) i + Θ(k) T )); and outcome Y (k) i Ber(Sigm(λU (k) i + T (k) i + Θ(k) Y )) . Note Sigm(x) = 1/(1 + e x) is the logistic function and Θ(k) V Normal(0, σ2 ΘV ) for V {T, Y, U}. Unless otherwise stated, we use σΘT = σΘU = σΘY = 1. We control the strength of confounding by varying λ, where λ = 0 corresponds to no confounding. Unless otherwise stated, each experiment is repeated 50 times where we use a significance level α = 0.05. Depending on the variable types in the experiment, we state what suitable conditional independence testing method is used by our algorithm. For the following experiments, we use the Kernel Conditional Independence Test [Zhang et al., 2012] in our algorithm due to having continuous variables and, unless otherwise stated, combine 50 hypothesis tests using Fisher s method.