Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning
Authors: Yaqi Duan, Chi Jin, Zhiyuan Li
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper considers batch Reinforcement Learning (RL) with general value function approximation. Our study investigates the minimal assumptions to reliably estimate/minimize Bellman error, and characterizes the generalization performance by (local) Rademacher complexities of general function classes, which makes initial steps in bridging the gap between statistical learning theory and batch RL. Concretely, we view the Bellman error as a surrogate loss for the optimality gap, and prove the followings: (1) In double sampling regime, the excess risk of Empirical Risk Minimizer (ERM) is bounded by the Rademacher complexity of the function class. (2) In the single sampling regime, sample-efficient risk minimization is not possible without further assumptions, regardless of algorithms. However, with completeness assumptions, the excess risk of FQI and a minimax style algorithm can be again bounded by the Rademacher complexity of the corresponding function classes. (3) Fast statistical rates can be achieved by using tools of local Rademacher complexity. Our analysis covers a wide range of function classes, including finite classes, linear spaces, kernel spaces, sparse linear features, etc. |
| Researcher Affiliation | Academia | 1Department of Operations Research and Financial Engineering, Princeton University 2Department of Electrical and Computer Engineering, Princeton University 3Department of Computer Science, Princeton University. |
| Pseudocode | Yes | Algorithm 1 FQI |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not describe empirical experiments involving datasets or their public availability. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments involving data splits. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware details used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not provide specific ancillary software details with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with specific hyperparameters or training configurations. |