reproducibilityindex.ai

Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage

Authors: Masatoshi Uehara, Wen Sun

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present an algorithm named Constrained Pessimistic Policy Optimization (CPPO) which leverages a general function class and uses a constraint over the model class to encode pessimism. Under the assumption that the ground truth model belongs to our function class (i.e., realizability in the function class), CPPO has a PAC guarantee with ofﬂine data only providing partial coverage, i.e., it can learn a policy that competes against any policy that is covered by the ofﬂine data. We then demonstrate the ﬂexibility of CPPO... Our theoretical results provide a sharp contrast between model-based and model-free approaches in ofﬂine RL.
Researcher Affiliation	Academia	Masatoshi Uehara, Wen Sun Department of Computer Science Cornell University, Ithaca, NY 14850, USA {mu223,ws455}@cornell.edu
Pseudocode	Yes	Algorithm 1 Constrained Pessimistic Policy Optimization (CPPO)
Open Source Code	No	The paper does not include any statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	No	The paper is theoretical and focuses on providing PAC guarantees and theoretical analysis under partial coverage. It mentions using an "ofﬂine dataset D" but does not describe using a specific, publicly available dataset with concrete access information for empirical training or evaluation.
Dataset Splits	No	The paper is theoretical and does not present empirical experiments. Therefore, there is no mention of dataset splits (training, validation, test) for reproducibility.
Hardware Specification	No	The paper is theoretical and does not describe running empirical experiments. Therefore, no hardware specifications are provided.
Software Dependencies	No	The paper is theoretical and does not describe running empirical experiments. Therefore, no specific software dependencies with version numbers are provided.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and theoretical guarantees. It does not describe an empirical experimental setup, hyperparameters, or system-level training settings.