reproducibilityindex.ai

Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

Authors: Ming Yin, Yu-Xiang Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study the ofﬂine reinforcement learning (ofﬂine RL) problem, where the goal is to learn a reward-maximizing policy in an unknown Markov Decision Process (MDP) using the data coming from a policy µ. In particular, we consider the sample complexity problems of ofﬂine RL for ﬁnite-horizon MDPs. Prior works study this problem based on different data-coverage assumptions, and their learning guarantees are expressed by the covering coefﬁcients which lack the explicit characterization of system quantities. In this work, we analyze the Adaptive Pessimistic Value Iteration (APVI) algorithm and derive the suboptimality upper bound that nearly matches... In complementary, we also prove a per-instance information-theoretical lower bound under the weak assumption that dµ h(sh, ah) > 0 if dπ h (sh, ah) > 0. Different from the previous minimax lower bounds, the per-instance lower bound (via local minimaxity) is a much stronger criterion as it applies to individual instances separately.
Researcher Affiliation	Academia	Ming Yin 1,2 and Yu-Xiang Wang1 1Department of Computer Science, UC Santa Barbara 2Department of Statistics and Applied Probability, UC Santa Barbara
Pseudocode	Yes	Algorithm 1 Adaptive (assumption-free) Pessimistic Value Iteration or LCBVI-Bernstein
Open Source Code	No	The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets	No	The paper is theoretical and focuses on algorithm analysis and deriving bounds. It does not mention or use any specific publicly available datasets for training, nor does it provide access information for any dataset.
Dataset Splits	No	The paper is theoretical and does not involve empirical experiments with data splits for training, validation, or testing.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, processors, or cloud computing specifications used for running experiments. It is a theoretical paper.
Software Dependencies	No	The paper does not list any specific software dependencies or their version numbers (e.g., programming languages, libraries, frameworks, or solvers).
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup with concrete hyperparameter values, training configurations, or system-level settings for empirical evaluation.