Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Provable Zero-Shot Generalization in Offline Reinforcement Learning

Authors: Zhiyong Wang, Chen Yang, John C.S. Lui, Dongruo Zhou

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose two meta-algorithms called pessimistic empirical risk minimization (PERM) and pessimistic proximal policy optimization (PPPO) that enable ZSG for offline RL (Jin et al., 2021). ... Our result shows that the sub-optimalities of the output policies are bounded by both the supervised learning error, which is controlled by the number of different environments, and the reinforcement learning error, which is controlled by the coverage of the offline dataset to the optimal policy. Please refer to Table 1 for a summary of our results. To the best of our knowledge, our proposed algorithms are the first offline RL methods that provably enjoy the ZSG property. ... Next we propose a theoretical analysis of PERM. ... Theorem 9 Set the Evaluation subroutine in Algorithm 2 as PPE (Algo.1). ... the output πPERM of Algorithm 2 satisfies Sub Opt(πPERM) ...
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China 2Department of Computer Science, Indiana University Bloomington, Bloomington, USA.
Pseudocode	Yes	Algorithm 1 Pessimistic Policy Evaluation (PPE) ... Algorithm 2 Pessimistic Empirical Risk Minimization (PERM) ... Algorithm 3 Pessimistic Proximal Policy Optimzation (PPPO) ... Algorithm 4 Pessimistic Policy Evaluation (PPE): Linear MDP
Open Source Code	No	The paper does not contain any statements or links regarding the release of source code for the methodology described.
Open Datasets	No	The paper discusses theoretical properties of offline datasets and a data collection process, but it does not specify or provide access information for any concrete publicly available dataset used for experimental validation.
Dataset Splits	No	The paper is theoretical and does not describe any experimental setup involving dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for running experiments.
Software Dependencies	No	The paper is theoretical and does not specify any software dependencies with version numbers required for replication.
Experiment Setup	No	The paper is theoretical and does not describe any specific experimental setup details or hyperparameters.