Towards Estimating Bounds on the Effect of Policies under Unobserved Confounding

Authors: Alexis Bellot, Silvia Chiappa

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper aims to provide a novel estimation framework to support decision-making in nonidentifiable settings, overcoming some of these challenges. We introduce several new results for the partial identification and estimation of the effect of stochastic and conditional policies from a combination of observational data and assumptions on the domain, encoded in a causal graph. Our contributions may be summarized as follows. We introduce several graphical criteria to derive new analytical bounds on the effect of policies with continuous outcomes and covariates, that improve upon the non-parametric bounds of [27, 33] and [47]. Given these analytical bounds, we then construct estimators leveraging the double machine learning [10] toolkit for scalable inference, and demonstrate that estimators exhibit favourable statistical properties such as robustness to noise and fast convergence. The results of this paper were illustrated through synthetic simulations and a real-world health campaign example for the reduction of obesity levels.
Researcher Affiliation Industry Alexis Bellot Silvia Chiappa Google DeepMind London, UK
Pseudocode Yes Algorithm 1: Bounds for the effect of policies
Open Source Code No The code will not be open sourced at this moment but we believe to have provided sufficient details to reproduce our results.
Open Datasets Yes The data was curated from anonymous from Colombia, Peru and Mexico, using a web platform [30]. The authors made the data available under a Creative Commons license5 and is currently hosted by Kaggle as a c.s.v file, accessible through the following link: kaggle.com/code/mpwolke/obesity-levels-life-style/.
Dataset Splits No The paper does not provide specific train/validation/test dataset splits with percentages, sample counts, or citations to predefined splits. It mentions partitioning data equally for DML estimation, but not as standard train/val/test splits.
Hardware Specification No Experiments were run on a single CPU in under a few minutes of wall time.
Software Dependencies No As described in the experimental section, for estimating nuisances pγ, µq we used Gradient Boosting models for classification and regression where appropriate. We implemented the models using Python using the commands Gradient Boosting Classifier() and Gradient Boosting Regressor() using default hyperparameters.
Experiment Setup Yes Setting 1: All nuisances estimated correctly. Setting 2: Nuisances ˆγ are sampled from a uniform distribution to induce misspecification in the estimation of γ. Setting 3: Nuisances ˆµ are sampled from a uniform distribution to induce misspecification in the estimation of µ. Setting 4: Noise ϵ is introduced in the estimation of all nuisances pγ, µq to emphasize error due to finite sample variation. Specifically, noise ϵ Normalpn α, n αq, α 1{4, that induces a slower rate of convergence as a function of sample size, inspired by [24, 20].