reproducibilityindex.ai

Bilinear Classes: A Structural Framework for Provable Generalization in RL

Authors: Simon Du, Sham Kakade, Jason Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This work introduces Bilinear Classes, a new structural framework, which permit generalization in reinforcement learning in a wide variety of settings through the use of function approximation. Our main result provides an RL algorithm which has polynomial sample complexity for Bilinear Classes; notably, this sample complexity is stated in terms of a reduction to the generalization error of an underlying supervised learning sub-problem. These bounds nearly match the best known sample complexity bounds for existing models. Furthermore, this framework also extends to the inﬁnite dimensional (RKHS) setting: for the the Linear Q /V model, linear MDPs, and linear mixture MDPs, we provide sample complexities that have no explicit dependence on the explicit feature dimension (which could be inﬁnite), but instead depends only on information theoretic quantities.
Researcher Affiliation	Academia	Simon S. Du 1 Sham M. Kakade 1 Jason D. Lee 2 Shachar Lovett 3 Gaurav Mahajan 3 Wen Sun 4 Ruosong Wang 5 1University of Washington 2Princeton University 3University of California, San Diego 4Cornell University 5Carnegie Mellon University.
Pseudocode	Yes	Algorithm 1 Bi Lin-UCB
Open Source Code	No	No explicit statement or link indicating the availability of open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and does not mention using or providing access to any specific datasets for empirical evaluation.
Dataset Splits	No	The paper is theoretical and does not conduct experiments on datasets, thus no training/validation/test splits are provided.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not list any specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations.