Counterfactual-Augmented Importance Sampling for Semi-Offline Policy Evaluation

Authors: Shengpu Tang, Jenna Wiens

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In a series of proof-of-concept experiments involving bandits and a healthcare-inspired simulator, we demonstrate that our approach outperforms purely offline IS estimators and is robust to imperfect annotations. 5 Experiments First, through a suite of simple bandit problems, we verify the theoretical properties of C-IS. Then, we apply our approach to a healthcare-inspired RL simulation domain, where we compare the performance of our proposed approach, C-PDIS, to several baselines in terms of their OPE accuracy and ability to rank policies, and explore robustness to bias, noise, and missingness in the annotations.
Researcher Affiliation Academia Shengpu Tang, Jenna Wiens Computer Science & Engineering University of Michigan, Ann Arbor, MI, USA {tangsp,wiensj}@umich.edu
Pseudocode No The paper provides mathematical definitions and recursive formulas for the estimators (e.g., Definition 2, Definition 4), but it does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The code for all experiments is available at https://github.com/MLD3/Counterfactual Annot-Semi OPE.
Open Datasets Yes Next, we apply our approach to evaluate policies in a simulated RL domain modeled after the physiology of sepsis patients [32]. Following prior work [22], we collected 50 offline datasets from the sepsis simulator (using different random seeds) each with 1000 episodes by following an ϵ-greedy behavior policy with respect to the optimal policy where ϵ = 0.1.
Dataset Splits No The paper describes collecting offline datasets and evaluating policies on them but does not specify explicit training, validation, and test dataset splits for the collected data.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or cloud computing specifications used for running experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, or other libraries with their versions).
Experiment Setup Yes For the sepsis simulator: 'Following prior work [22], we collected 50 offline datasets from the sepsis simulator (using different random seeds) each with 1000 episodes by following an ϵ-greedy behavior policy with respect to the optimal policy where ϵ = 0.1.' For the bandit experiments: 'In Table 1, we display the results for R(s1, ) = 1, R(s1, ) = 2, R(s2, ) = 1, σ = 0.5.'