Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SOPE: Spectrum of Off-Policy Estimators

Authors: Christina Yuan, Yash Chandak, Stephen Giguere, Philip S. Thomas, Scott Niekum

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical evidence that estimators in this spectrum can be used to trade-off between the bias and variance of IS and SIS and can achieve lower mean-squared error than both IS and SIS. Additionally, we also establish a spectrum for doubly-robust and weighted version of these estimators.
Researcher Affiliation	Academia	Christina J. Yuan University of Texas at Austin EMAIL Yash Chandak University of Massachusetts EMAIL Stephen Giguere University of Texas at Austin EMAIL Philip S. Thomas University of Massachusetts EMAIL Scott Niekum University of Texas at Austin EMAIL
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a statement or link indicating that the source code for the described methodology (SOPEn, W-SOPEn, DR-SOPEn) is open-source or publicly available. It mentions using 'Caltech OPE Benchmarking Suite (COBS)' but not releasing their own code.
Open Datasets	No	The paper mentions using 'Graph and Toy Mountain Car environments' from the 'Caltech OPE Benchmarking Suite (COBS)', but does not provide concrete access information (link, DOI, specific citation) for a publicly available dataset used for training. It refers to environments and a benchmarking suite rather than specific datasets with access details.
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, or testing data. It only mentions 'different batch sizes of historical data'.
Hardware Specification	No	The paper does not provide specific hardware details (such as GPU/CPU models or memory specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers. It mentions using 'Caltech OPE Benchmarking Suite (COBS)' but does not specify its version or the versions of any other libraries or frameworks used.
Experiment Setup	Yes	The evaluation and behavior policies are e(a = 0) = 0.9 and b(a = 0) = 0.5 for the experiments on the Graph Domain and and e(a = 0) = 0.5 and b(a = 0) = 0.6 for the Toy Mountain Car domain.