State Relevance for Off-Policy Evaluation

Authors: Simon P Shen, Yecheng Ma, Omer Gottesman, Finale Doshi-Velez

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we experimentally validate the efficacy of the likelihood ratio omission procedure of OSIRIS and the relevance estimation procedure in Algorithm 1. We demonstrate that they improve estimator accuracy.
Researcher Affiliation Academia 1Harvard University, Cambridge, MA 2University of Pennsylvania, Philadelphia, PA 3Brown University, Providence, RI.
Pseudocode Yes Algorithm 1 Estimating state relevance ˆθ(s; D)
Open Source Code Yes All code and models used to generate these results are publicly accessible at github.com/dtak/osiris.
Open Datasets No The paper refers to standard benchmark environments like Gridworld, Cart Pole, and Lunar Lander, and mentions collecting 'historical data' or 'trajectories' (D) but does not provide specific access information (link, DOI, formal citation) to a publicly available dataset used for training. It appears the data is generated within these environments.
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits with specific percentages or counts. It mentions evaluation of models but not how data was partitioned for validation.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes All policies are ϵ-greedy where ϵ is smaller in the evaluation policy. [...] results are aggregated from 200 trials where |D| = 25 for the Gridworlds and |D| = 50 for Cart Pole and Lunar Lander. [...] For only the calculation of state relevance, we discretized the state space by creating linearly spaced bins per state dimension. [...] Welch s two-sample t-test comparing the samples G+ and G with significance level α