When is Realizability Sufficient for Off-Policy Reinforcement Learning?
Authors: Andrea Zanette
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We establish finite-sample guarantees for offpolicy reinforcement learning that are free of the approximation error term known as inherent Bellman error, and that depend on the interplay of three factors. The first two are well known: they are the metric entropy of the function class and the concentrability coefficient that represents the cost of learning off-policy. The third factor is new, and it measures the violation of Bellman completeness, namely the mis-alignment between the chosen function class and its image through the Bellman operator. Our analysis directly applies to the solution found by temporal difference algorithms when they converge. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, United States of America. Correspondence to: Andrea Zanette <zanette@berkeley.edu>. |
| Pseudocode | No | The paper describes algorithms such as Fitted Q and the minimax formulation using textual descriptions and mathematical equations, but it does not include structured pseudocode blocks or clearly labeled algorithm boxes. |
| Open Source Code | No | The paper is theoretical and focuses on analysis rather than a new implementation, and therefore does not mention providing open-source code for its methodology. |
| Open Datasets | No | This paper is theoretical and does not conduct empirical experiments using specific datasets. It discusses theoretical concepts related to data distributions but does not mention publicly available datasets for experimental evaluation. |
| Dataset Splits | No | This paper is theoretical and does not involve empirical experiments with dataset splits for training, validation, or testing. |
| Hardware Specification | No | This paper is theoretical and does not mention any hardware specifications used for experiments. |
| Software Dependencies | No | This paper is theoretical and does not describe any specific software dependencies with version numbers. |
| Experiment Setup | No | This paper is theoretical and does not describe any experimental setup details, hyperparameters, or training configurations. |