Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
Authors: Hongseok Namkoong, Ramtin Keramati, Steve Yadlowsky, Emma Brunskill
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On simulated healthcare examples management of sepsis and interventions for autistic children where this is a reasonable model, we demonstrate that our method invalidates non-robust results and provides meaningful certificates of robustness, allowing reliable selection of policies under unobserved confounding. |
| Researcher Affiliation | Academia | Hongseok Namkoong Decision, Risk, and Operations Division Columbia Business School namkoong@gsb.columbia.edu Ramtin Keramati Computational and Mathematical Engineering Stanford University keramati@cs.stanford.edu Steve Yadlowsky Electrical Engineering Stanford University syadlows@stanford.edu Emma Brunskill Computer Science Stanford University ebrun@cs.stanford.edu |
| Pseudocode | No | The paper includes mathematical formulations and theorems but does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/Stanford AI4HI/off_policy_confounding.git |
| Open Datasets | Yes | Using the sepsis simulator developed by Oberst and Sontag [38], we consider a scenario where automated policies have been proposed, and we wish to evaluate their benefits. Using a simulator for autistic children developed by Lu et al. [31], which models the data from a (real) sequential randomized trial (SMART) [23], we compare different approaches for improving the number of speech utterances. |
| Dataset Splits | No | The paper mentions data generation from simulators and evaluation but does not specify explicit train/validation/test dataset splits with percentages or sample counts for the data used in experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies or their version numbers required for replication of the experiments. |
| Experiment Setup | Yes | To simulate unrecorded comorbidities that could introduce confounding, we simulate an unobserved confounder associated with favorable state transitions. At t = 1, we take the optimal action with respect to all other options (vasopressors and mechanical ventilation), and administer antibiotics with probability 1/(1+Gamma) if the confounding variable is large, and with probability 1/(1+Gamma) if the confounding variable is small. This satisfies Assumption F with level Gamma. For t >= 2, the behavior policy takes the optimal next treatment action with probability 0.85, and otherwise switches the vasopressor status, independent of the confounders. |