State Relevance for Off-Policy Evaluation
Authors: Simon P Shen, Yecheng Ma, Omer Gottesman, Finale Doshi-Velez
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we experimentally validate the efficacy of the likelihood ratio omission procedure of OSIRIS and the relevance estimation procedure in Algorithm 1. We demonstrate that they improve estimator accuracy. |
| Researcher Affiliation | Academia | 1Harvard University, Cambridge, MA 2University of Pennsylvania, Philadelphia, PA 3Brown University, Providence, RI. |
| Pseudocode | Yes | Algorithm 1 Estimating state relevance ˆθ(s; D) |
| Open Source Code | Yes | All code and models used to generate these results are publicly accessible at github.com/dtak/osiris. |
| Open Datasets | No | The paper refers to standard benchmark environments like Gridworld, Cart Pole, and Lunar Lander, and mentions collecting 'historical data' or 'trajectories' (D) but does not provide specific access information (link, DOI, formal citation) to a publicly available dataset used for training. It appears the data is generated within these environments. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits with specific percentages or counts. It mentions evaluation of models but not how data was partitioned for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | All policies are ϵ-greedy where ϵ is smaller in the evaluation policy. [...] results are aggregated from 200 trials where |D| = 25 for the Gridworlds and |D| = 50 for Cart Pole and Lunar Lander. [...] For only the calculation of state relevance, we discretized the state space by creating linearly spaced bins per state dimension. [...] Welch s two-sample t-test comparing the samples G+ and G with significance level α |