SOPE: Spectrum of Off-Policy Estimators

Authors: Christina Yuan, Yash Chandak, Stephen Giguere, Philip S. Thomas, Scott Niekum

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide empirical evidence that estimators in this spectrum can be used to trade-off between the bias and variance of IS and SIS and can achieve lower mean-squared error than both IS and SIS. Additionally, we also establish a spectrum for doubly-robust and weighted version of these estimators.
Researcher Affiliation Academia Christina J. Yuan University of Texas at Austin cjyuan@cs.utexas.edu Yash Chandak University of Massachusetts ychandak@cs.umass.edu Stephen Giguere University of Texas at Austin sgiguere@cs.utexas.edu Philip S. Thomas University of Massachusetts pthomas@cs.umass.edu Scott Niekum University of Texas at Austin sniekum@cs.utexas.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a statement or link indicating that the source code for the described methodology (SOPEn, W-SOPEn, DR-SOPEn) is open-source or publicly available. It mentions using 'Caltech OPE Benchmarking Suite (COBS)' but not releasing their own code.
Open Datasets No The paper mentions using 'Graph and Toy Mountain Car environments' from the 'Caltech OPE Benchmarking Suite (COBS)', but does not provide concrete access information (link, DOI, specific citation) for a publicly available dataset used for training. It refers to environments and a benchmarking suite rather than specific datasets with access details.
Dataset Splits No The paper does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, or testing data. It only mentions 'different batch sizes of historical data'.
Hardware Specification No The paper does not provide specific hardware details (such as GPU/CPU models or memory specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers. It mentions using 'Caltech OPE Benchmarking Suite (COBS)' but does not specify its version or the versions of any other libraries or frameworks used.
Experiment Setup Yes The evaluation and behavior policies are e(a = 0) = 0.9 and b(a = 0) = 0.5 for the experiments on the Graph Domain and and e(a = 0) = 0.5 and b(a = 0) = 0.6 for the Toy Mountain Car domain.