Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Provable Zero-Shot Generalization in Offline Reinforcement Learning
Authors: Zhiyong Wang, Chen Yang, John C.S. Lui, Dongruo Zhou
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose two meta-algorithms called pessimistic empirical risk minimization (PERM) and pessimistic proximal policy optimization (PPPO) that enable ZSG for offline RL (Jin et al., 2021). ... Our result shows that the sub-optimalities of the output policies are bounded by both the supervised learning error, which is controlled by the number of different environments, and the reinforcement learning error, which is controlled by the coverage of the offline dataset to the optimal policy. Please refer to Table 1 for a summary of our results. To the best of our knowledge, our proposed algorithms are the first offline RL methods that provably enjoy the ZSG property. ... Next we propose a theoretical analysis of PERM. ... Theorem 9 Set the Evaluation subroutine in Algorithm 2 as PPE (Algo.1). ... the output πPERM of Algorithm 2 satisfies Sub Opt(πPERM) ... |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China 2Department of Computer Science, Indiana University Bloomington, Bloomington, USA. |
| Pseudocode | Yes | Algorithm 1 Pessimistic Policy Evaluation (PPE) ... Algorithm 2 Pessimistic Empirical Risk Minimization (PERM) ... Algorithm 3 Pessimistic Proximal Policy Optimzation (PPPO) ... Algorithm 4 Pessimistic Policy Evaluation (PPE): Linear MDP |
| Open Source Code | No | The paper does not contain any statements or links regarding the release of source code for the methodology described. |
| Open Datasets | No | The paper discusses theoretical properties of offline datasets and a data collection process, but it does not specify or provide access information for any concrete publicly available dataset used for experimental validation. |
| Dataset Splits | No | The paper is theoretical and does not describe any experimental setup involving dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers required for replication. |
| Experiment Setup | No | The paper is theoretical and does not describe any specific experimental setup details or hyperparameters. |