Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Authors: Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael Jordan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We establish both polynomial runtime complexity and polynomial sample complexity for this algorithm, without additional assumptions on the data-generating model. In particular, we prove that the algorithm incurs an e O(δFH2p T) regret, where δF characterizes the intrinsic complexity of the function class F, H is the length of each episode, and T is the total number of episodes. Our regret bounds are independent of the number of states, a result which exhibits clearly the benefit of function approximation in RL. To the best of our knowledge, this is the first provably efficient framework for reinforcement learning with kernel and neural network function approximations.
Researcher Affiliation Academia Zhuoran Yang Princeton University zy6@princeton.edu Chi Jin Princeton University chij@princeton.edu Zhaoran Wang Northwestern University zhaoran.wang@northwestern.edu Mengdi Wang Princeton University mengdiw@princeton.edu Michael I. Jordan University of California, Berkeley jordan@cs.berkeley.edu
Pseudocode Yes Algorithm 1 Optimistic Least-Squares Value Iteration with Function Approximation
Open Source Code No The paper does not provide any statements about open-source code for the described methodology or links to code repositories.
Open Datasets No This is a theoretical paper and does not describe experiments using specific datasets, nor does it provide information about dataset availability or access.
Dataset Splits No This is a theoretical paper and does not describe experiments using specific datasets. Therefore, no dataset split information (train/validation/test) is provided.
Hardware Specification No The paper is theoretical and does not describe experiments requiring specific hardware. Therefore, no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not describe experiments requiring specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on algorithm design and proofs, not experimental evaluation. Therefore, no experimental setup details like hyperparameters or training settings are provided.