reproducibilityindex.ai

Universal Off-Policy Evaluation

Authors: Yash Chandak, Scott Niekum, Bruno da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we provide empirical support for the established theoretical results for the proposed Un O estimator and high-conﬁdence bounds. To do so, we use the following domains: (1) An open source implementation [102] of the FDA-approved type-1 diabetes treatment simulator [59], (2) A stationary and a non-stationary recommender system domain, and (3) A continuous-state Gridworld with partial observability, where data is collected using multiple behavior policies.
Researcher Affiliation	Academia	Yash Chandak University of Massachusetts Scott Niekum University of Texas Austin Bruno Castro da Silva University of Massachusetts Erik Learned-Miller University of Massachusetts Emma Brunskill Stanford University Philip S. Thomas University of Massachusetts
Pseudocode	Yes	This procedure is outlined in Algorithm 1 in Appendix E.4.
Open Source Code	Yes	code for the proposed Un O method(s) and the domains used for empirical studies are available https://github.com/yashchandak/Un O.
Open Datasets	Yes	To do so, we use the following domains: (1) An open source implementation [102] of the FDA-approved type-1 diabetes treatment simulator [59], (2) A stationary and a non-stationary recommender system domain, and (3) A continuous-state Gridworld with partial observability, where data is collected using multiple behavior policies. [103] J. Xie. Simglucose v0.2.1 (2018), 2019. URL https://github.com/jxx123/ simglucose.
Dataset Splits	No	The paper mentions collecting data (e.g., '3 104.5 samples') but does not specify explicit training, validation, or test dataset splits with percentages, absolute counts, or predefined split methods.
Hardware Specification	Yes	All the experiments were conducted on a personal computer with 32 Gi B of memory and an Intel Core i7 CPU with 12 threads.
Software Dependencies	No	Our code uses Julia [11], Blackboxoptim library [34], Python [97], and Py Torch [68].
Experiment Setup	Yes	Bounds were obtained for a failure rate δ = 0.05.