Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding

Authors: Hongseok Namkoong, Ramtin Keramati, Steve Yadlowsky, Emma Brunskill

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On simulated healthcare examples management of sepsis and interventions for autistic children where this is a reasonable model, we demonstrate that our method invalidates non-robust results and provides meaningful certificates of robustness, allowing reliable selection of policies under unobserved confounding.
Researcher Affiliation Academia Hongseok Namkoong Decision, Risk, and Operations Division Columbia Business School namkoong@gsb.columbia.edu Ramtin Keramati Computational and Mathematical Engineering Stanford University keramati@cs.stanford.edu Steve Yadlowsky Electrical Engineering Stanford University syadlows@stanford.edu Emma Brunskill Computer Science Stanford University ebrun@cs.stanford.edu
Pseudocode No The paper includes mathematical formulations and theorems but does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Our code is publicly available at https://github.com/Stanford AI4HI/off_policy_confounding.git
Open Datasets Yes Using the sepsis simulator developed by Oberst and Sontag [38], we consider a scenario where automated policies have been proposed, and we wish to evaluate their benefits. Using a simulator for autistic children developed by Lu et al. [31], which models the data from a (real) sequential randomized trial (SMART) [23], we compare different approaches for improving the number of speech utterances.
Dataset Splits No The paper mentions data generation from simulators and evaluation but does not specify explicit train/validation/test dataset splits with percentages or sample counts for the data used in experiments.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies or their version numbers required for replication of the experiments.
Experiment Setup Yes To simulate unrecorded comorbidities that could introduce confounding, we simulate an unobserved confounder associated with favorable state transitions. At t = 1, we take the optimal action with respect to all other options (vasopressors and mechanical ventilation), and administer antibiotics with probability 1/(1+Gamma) if the confounding variable is large, and with probability 1/(1+Gamma) if the confounding variable is small. This satisfies Assumption F with level Gamma. For t >= 2, the behavior policy takes the optimal next treatment action with probability 0.85, and otherwise switches the vasopressor status, independent of the confounders.