Universal Off-Policy Evaluation
Authors: Yash Chandak, Scott Niekum, Bruno da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide empirical support for the established theoretical results for the proposed Un O estimator and high-confidence bounds. To do so, we use the following domains: (1) An open source implementation [102] of the FDA-approved type-1 diabetes treatment simulator [59], (2) A stationary and a non-stationary recommender system domain, and (3) A continuous-state Gridworld with partial observability, where data is collected using multiple behavior policies. |
| Researcher Affiliation | Academia | Yash Chandak University of Massachusetts Scott Niekum University of Texas Austin Bruno Castro da Silva University of Massachusetts Erik Learned-Miller University of Massachusetts Emma Brunskill Stanford University Philip S. Thomas University of Massachusetts |
| Pseudocode | Yes | This procedure is outlined in Algorithm 1 in Appendix E.4. |
| Open Source Code | Yes | code for the proposed Un O method(s) and the domains used for empirical studies are available https://github.com/yashchandak/Un O. |
| Open Datasets | Yes | To do so, we use the following domains: (1) An open source implementation [102] of the FDA-approved type-1 diabetes treatment simulator [59], (2) A stationary and a non-stationary recommender system domain, and (3) A continuous-state Gridworld with partial observability, where data is collected using multiple behavior policies. [103] J. Xie. Simglucose v0.2.1 (2018), 2019. URL https://github.com/jxx123/ simglucose. |
| Dataset Splits | No | The paper mentions collecting data (e.g., '3 104.5 samples') but does not specify explicit training, validation, or test dataset splits with percentages, absolute counts, or predefined split methods. |
| Hardware Specification | Yes | All the experiments were conducted on a personal computer with 32 Gi B of memory and an Intel Core i7 CPU with 12 threads. |
| Software Dependencies | No | Our code uses Julia [11], Blackboxoptim library [34], Python [97], and Py Torch [68]. |
| Experiment Setup | Yes | Bounds were obtained for a failure rate δ = 0.05. |