Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Simultaneous Statistical Inference for Off-Policy Evaluation in Reinforcement Learning

Authors: Tianpai Luo, Xinyuan Fan, Weichi Wu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The effectiveness of the proposed approach is demonstrated through simulations and analysis of the Ohio T1DM dataset.
Researcher Affiliation Academia Tianpai Luo Xinyuan Fan Weichi Wu Department of Statistics and Data Science Tsinghua University Beijing 100084, China EMAIL, EMAIL
Pseudocode Yes We propose the Gaussian multiplier bootstrap algorithm to circumvent these problems and derive a feasible SCR in Algorithm 1.
Open Source Code Yes The code is available at https://github.com/xinyuanfan01/Simultaneous-Statistical-Inference-for-Off-Policy-Evaluation-in-Reinforcement-Learning.
Open Datasets Yes The effectiveness of the proposed approach is demonstrated through simulations and analysis of the Ohio T1DM dataset.
Dataset Splits Yes The downloaded dataset has been separated as training group and testing group. Our objective is to conduct the simultaneous OPE on the testing group under the target policies obtained from the training group.
Hardware Specification Yes The experiments can be readily conducted on a standard workstation, for example, an Apple M1 machine with 16 GB of RAM running mac OS Sonoma.
Software Dependencies No The basis functions are constructed using the tensor product of K Legendre or spline functions. The number of basis functions is determined through cross-validation (Qiu et al. 2021).
Experiment Setup Yes In our settings, the state vector S0,t may not have bounded support. To address this, we apply a sigmoid transformation, defined as sigmoid(S(j) 0,t ) = 1 1+exp( S(j) 0,t) for 1 j d, to obtain features with bounded support. The basis functions are constructed using the tensor product of K Legendre or spline functions. The number of basis functions is determined through cross-validation (Qiu et al. 2021). We put the detailed cross-validation procedure in Section D of the supplement.