Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Simultaneous Statistical Inference for Off-Policy Evaluation in Reinforcement Learning

Authors: Tianpai Luo, Xinyuan Fan, Weichi Wu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The effectiveness of the proposed approach is demonstrated through simulations and analysis of the Ohio T1DM dataset.
Researcher Affiliation	Academia	Tianpai Luo Xinyuan Fan Weichi Wu Department of Statistics and Data Science Tsinghua University Beijing 100084, China EMAIL, EMAIL
Pseudocode	Yes	We propose the Gaussian multiplier bootstrap algorithm to circumvent these problems and derive a feasible SCR in Algorithm 1.
Open Source Code	Yes	The code is available at https://github.com/xinyuanfan01/Simultaneous-Statistical-Inference-for-Off-Policy-Evaluation-in-Reinforcement-Learning.
Open Datasets	Yes	The effectiveness of the proposed approach is demonstrated through simulations and analysis of the Ohio T1DM dataset.
Dataset Splits	Yes	The downloaded dataset has been separated as training group and testing group. Our objective is to conduct the simultaneous OPE on the testing group under the target policies obtained from the training group.
Hardware Specification	Yes	The experiments can be readily conducted on a standard workstation, for example, an Apple M1 machine with 16 GB of RAM running mac OS Sonoma.
Software Dependencies	No	The basis functions are constructed using the tensor product of K Legendre or spline functions. The number of basis functions is determined through cross-validation (Qiu et al. 2021).
Experiment Setup	Yes	In our settings, the state vector S0,t may not have bounded support. To address this, we apply a sigmoid transformation, defined as sigmoid(S(j) 0,t ) = 1 1+exp( S(j) 0,t) for 1 j d, to obtain features with bounded support. The basis functions are constructed using the tensor product of K Legendre or spline functions. The number of basis functions is determined through cross-validation (Qiu et al. 2021). We put the detailed cross-validation procedure in Section D of the supplement.