Universal Off-Policy Evaluation

Authors: Yash Chandak, Scott Niekum, Bruno da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide empirical support for the established theoretical results for the proposed Un O estimator and high-confidence bounds. To do so, we use the following domains: (1) An open source implementation [102] of the FDA-approved type-1 diabetes treatment simulator [59], (2) A stationary and a non-stationary recommender system domain, and (3) A continuous-state Gridworld with partial observability, where data is collected using multiple behavior policies.
Researcher Affiliation Academia Yash Chandak University of Massachusetts Scott Niekum University of Texas Austin Bruno Castro da Silva University of Massachusetts Erik Learned-Miller University of Massachusetts Emma Brunskill Stanford University Philip S. Thomas University of Massachusetts
Pseudocode Yes This procedure is outlined in Algorithm 1 in Appendix E.4.
Open Source Code Yes code for the proposed Un O method(s) and the domains used for empirical studies are available https://github.com/yashchandak/Un O.
Open Datasets Yes To do so, we use the following domains: (1) An open source implementation [102] of the FDA-approved type-1 diabetes treatment simulator [59], (2) A stationary and a non-stationary recommender system domain, and (3) A continuous-state Gridworld with partial observability, where data is collected using multiple behavior policies. [103] J. Xie. Simglucose v0.2.1 (2018), 2019. URL https://github.com/jxx123/ simglucose.
Dataset Splits No The paper mentions collecting data (e.g., '3 104.5 samples') but does not specify explicit training, validation, or test dataset splits with percentages, absolute counts, or predefined split methods.
Hardware Specification Yes All the experiments were conducted on a personal computer with 32 Gi B of memory and an Intel Core i7 CPU with 12 threads.
Software Dependencies No Our code uses Julia [11], Blackboxoptim library [34], Python [97], and Py Torch [68].
Experiment Setup Yes Bounds were obtained for a failure rate δ = 0.05.