Bilinear Classes: A Structural Framework for Provable Generalization in RL

Authors: Simon Du, Sham Kakade, Jason Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This work introduces Bilinear Classes, a new structural framework, which permit generalization in reinforcement learning in a wide variety of settings through the use of function approximation. Our main result provides an RL algorithm which has polynomial sample complexity for Bilinear Classes; notably, this sample complexity is stated in terms of a reduction to the generalization error of an underlying supervised learning sub-problem. These bounds nearly match the best known sample complexity bounds for existing models. Furthermore, this framework also extends to the infinite dimensional (RKHS) setting: for the the Linear Q /V model, linear MDPs, and linear mixture MDPs, we provide sample complexities that have no explicit dependence on the explicit feature dimension (which could be infinite), but instead depends only on information theoretic quantities.
Researcher Affiliation Academia Simon S. Du 1 Sham M. Kakade 1 Jason D. Lee 2 Shachar Lovett 3 Gaurav Mahajan 3 Wen Sun 4 Ruosong Wang 5 1University of Washington 2Princeton University 3University of California, San Diego 4Cornell University 5Carnegie Mellon University.
Pseudocode Yes Algorithm 1 Bi Lin-UCB
Open Source Code No No explicit statement or link indicating the availability of open-source code for the described methodology.
Open Datasets No The paper is theoretical and does not mention using or providing access to any specific datasets for empirical evaluation.
Dataset Splits No The paper is theoretical and does not conduct experiments on datasets, thus no training/validation/test splits are provided.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and does not list any specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations.