Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
SOPE: Spectrum of Off-Policy Estimators
Authors: Christina Yuan, Yash Chandak, Stephen Giguere, Philip S. Thomas, Scott Niekum
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide empirical evidence that estimators in this spectrum can be used to trade-off between the bias and variance of IS and SIS and can achieve lower mean-squared error than both IS and SIS. Additionally, we also establish a spectrum for doubly-robust and weighted version of these estimators. |
| Researcher Affiliation | Academia | Christina J. Yuan University of Texas at Austin EMAIL Yash Chandak University of Massachusetts EMAIL Stephen Giguere University of Texas at Austin EMAIL Philip S. Thomas University of Massachusetts EMAIL Scott Niekum University of Texas at Austin EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a statement or link indicating that the source code for the described methodology (SOPEn, W-SOPEn, DR-SOPEn) is open-source or publicly available. It mentions using 'Caltech OPE Benchmarking Suite (COBS)' but not releasing their own code. |
| Open Datasets | No | The paper mentions using 'Graph and Toy Mountain Car environments' from the 'Caltech OPE Benchmarking Suite (COBS)', but does not provide concrete access information (link, DOI, specific citation) for a publicly available dataset used for training. It refers to environments and a benchmarking suite rather than specific datasets with access details. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, or testing data. It only mentions 'different batch sizes of historical data'. |
| Hardware Specification | No | The paper does not provide specific hardware details (such as GPU/CPU models or memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. It mentions using 'Caltech OPE Benchmarking Suite (COBS)' but does not specify its version or the versions of any other libraries or frameworks used. |
| Experiment Setup | Yes | The evaluation and behavior policies are e(a = 0) = 0.9 and b(a = 0) = 0.5 for the experiments on the Graph Domain and and e(a = 0) = 0.5 and b(a = 0) = 0.6 for the Toy Mountain Car domain. |