When is Realizability Sufficient for Off-Policy Reinforcement Learning?

Authors: Andrea Zanette

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We establish finite-sample guarantees for offpolicy reinforcement learning that are free of the approximation error term known as inherent Bellman error, and that depend on the interplay of three factors. The first two are well known: they are the metric entropy of the function class and the concentrability coefficient that represents the cost of learning off-policy. The third factor is new, and it measures the violation of Bellman completeness, namely the mis-alignment between the chosen function class and its image through the Bellman operator. Our analysis directly applies to the solution found by temporal difference algorithms when they converge.
Researcher Affiliation Academia 1Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, United States of America. Correspondence to: Andrea Zanette <zanette@berkeley.edu>.
Pseudocode No The paper describes algorithms such as Fitted Q and the minimax formulation using textual descriptions and mathematical equations, but it does not include structured pseudocode blocks or clearly labeled algorithm boxes.
Open Source Code No The paper is theoretical and focuses on analysis rather than a new implementation, and therefore does not mention providing open-source code for its methodology.
Open Datasets No This paper is theoretical and does not conduct empirical experiments using specific datasets. It discusses theoretical concepts related to data distributions but does not mention publicly available datasets for experimental evaluation.
Dataset Splits No This paper is theoretical and does not involve empirical experiments with dataset splits for training, validation, or testing.
Hardware Specification No This paper is theoretical and does not mention any hardware specifications used for experiments.
Software Dependencies No This paper is theoretical and does not describe any specific software dependencies with version numbers.
Experiment Setup No This paper is theoretical and does not describe any experimental setup details, hyperparameters, or training configurations.