Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage
Authors: Masatoshi Uehara, Wen Sun
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We present an algorithm named Constrained Pessimistic Policy Optimization (CPPO) which leverages a general function class and uses a constraint over the model class to encode pessimism. Under the assumption that the ground truth model belongs to our function class (i.e., realizability in the function class), CPPO has a PAC guarantee with offline data only providing partial coverage, i.e., it can learn a policy that competes against any policy that is covered by the offline data. We then demonstrate the flexibility of CPPO... Our theoretical results provide a sharp contrast between model-based and model-free approaches in offline RL. |
| Researcher Affiliation | Academia | Masatoshi Uehara, Wen Sun Department of Computer Science Cornell University, Ithaca, NY 14850, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Constrained Pessimistic Policy Optimization (CPPO) |
| Open Source Code | No | The paper does not include any statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper is theoretical and focuses on providing PAC guarantees and theoretical analysis under partial coverage. It mentions using an "offline dataset D" but does not describe using a specific, publicly available dataset with concrete access information for empirical training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not present empirical experiments. Therefore, there is no mention of dataset splits (training, validation, test) for reproducibility. |
| Hardware Specification | No | The paper is theoretical and does not describe running empirical experiments. Therefore, no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not describe running empirical experiments. Therefore, no specific software dependencies with version numbers are provided. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and theoretical guarantees. It does not describe an empirical experimental setup, hyperparameters, or system-level training settings. |