Bilinear Classes: A Structural Framework for Provable Generalization in RL
Authors: Simon Du, Sham Kakade, Jason Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This work introduces Bilinear Classes, a new structural framework, which permit generalization in reinforcement learning in a wide variety of settings through the use of function approximation. Our main result provides an RL algorithm which has polynomial sample complexity for Bilinear Classes; notably, this sample complexity is stated in terms of a reduction to the generalization error of an underlying supervised learning sub-problem. These bounds nearly match the best known sample complexity bounds for existing models. Furthermore, this framework also extends to the infinite dimensional (RKHS) setting: for the the Linear Q /V model, linear MDPs, and linear mixture MDPs, we provide sample complexities that have no explicit dependence on the explicit feature dimension (which could be infinite), but instead depends only on information theoretic quantities. |
| Researcher Affiliation | Academia | Simon S. Du 1 Sham M. Kakade 1 Jason D. Lee 2 Shachar Lovett 3 Gaurav Mahajan 3 Wen Sun 4 Ruosong Wang 5 1University of Washington 2Princeton University 3University of California, San Diego 4Cornell University 5Carnegie Mellon University. |
| Pseudocode | Yes | Algorithm 1 Bi Lin-UCB |
| Open Source Code | No | No explicit statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not mention using or providing access to any specific datasets for empirical evaluation. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments on datasets, thus no training/validation/test splits are provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations. |