reproducibilityindex.ai

When is Realizability Sufficient for Off-Policy Reinforcement Learning?

Authors: Andrea Zanette

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We establish ﬁnite-sample guarantees for offpolicy reinforcement learning that are free of the approximation error term known as inherent Bellman error, and that depend on the interplay of three factors. The ﬁrst two are well known: they are the metric entropy of the function class and the concentrability coefﬁcient that represents the cost of learning off-policy. The third factor is new, and it measures the violation of Bellman completeness, namely the mis-alignment between the chosen function class and its image through the Bellman operator. Our analysis directly applies to the solution found by temporal difference algorithms when they converge.
Researcher Affiliation	Academia	1Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, United States of America. Correspondence to: Andrea Zanette <zanette@berkeley.edu>.
Pseudocode	No	The paper describes algorithms such as Fitted Q and the minimax formulation using textual descriptions and mathematical equations, but it does not include structured pseudocode blocks or clearly labeled algorithm boxes.
Open Source Code	No	The paper is theoretical and focuses on analysis rather than a new implementation, and therefore does not mention providing open-source code for its methodology.
Open Datasets	No	This paper is theoretical and does not conduct empirical experiments using specific datasets. It discusses theoretical concepts related to data distributions but does not mention publicly available datasets for experimental evaluation.
Dataset Splits	No	This paper is theoretical and does not involve empirical experiments with dataset splits for training, validation, or testing.
Hardware Specification	No	This paper is theoretical and does not mention any hardware specifications used for experiments.
Software Dependencies	No	This paper is theoretical and does not describe any specific software dependencies with version numbers.
Experiment Setup	No	This paper is theoretical and does not describe any experimental setup details, hyperparameters, or training configurations.