Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

Authors: Omer Gottesman, Joseph Futoma, Yao Liu, Sonali Parbhoo, Leo Celi, Emma Brunskill, Finale Doshi-Velez

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on medical simulations and real-world intensive care unit data demonstrate that our method can be used to identify limitations in the evaluation process and make evaluation more robust.
Researcher Affiliation Academia 1Harvard University 2Stanford University 3MIT.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes Code for reproducing the results in this paper can be found at https://github.com/dtak/interpretable_ope_public.git
Open Datasets Yes Our data source is a subset of the publicly available MIMIC-III dataset (Johnson et al., 2016).
Dataset Splits No Our final dataset consists of 346 patient trajectories (6777 transitions) for learning a policy and another 346 trajectories (6863 transitions) for evaluation of the policy via OPE and influence analysis.
Hardware Specification No The paper does not provide specific details on the hardware used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes In all figures, we highlight in red all influential transitions our method would have highlighted for review by domain experts ( Ic = 0.05). As an evaluation policy, we use the most common action of a state s 50 nearest neighbors.