Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations
Authors: Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael Jordan
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We establish both polynomial runtime complexity and polynomial sample complexity for this algorithm, without additional assumptions on the data-generating model. In particular, we prove that the algorithm incurs an e O(δFH2p T) regret, where δF characterizes the intrinsic complexity of the function class F, H is the length of each episode, and T is the total number of episodes. Our regret bounds are independent of the number of states, a result which exhibits clearly the benefit of function approximation in RL. To the best of our knowledge, this is the first provably efficient framework for reinforcement learning with kernel and neural network function approximations. |
| Researcher Affiliation | Academia | Zhuoran Yang Princeton University zy6@princeton.edu Chi Jin Princeton University chij@princeton.edu Zhaoran Wang Northwestern University zhaoran.wang@northwestern.edu Mengdi Wang Princeton University mengdiw@princeton.edu Michael I. Jordan University of California, Berkeley jordan@cs.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Optimistic Least-Squares Value Iteration with Function Approximation |
| Open Source Code | No | The paper does not provide any statements about open-source code for the described methodology or links to code repositories. |
| Open Datasets | No | This is a theoretical paper and does not describe experiments using specific datasets, nor does it provide information about dataset availability or access. |
| Dataset Splits | No | This is a theoretical paper and does not describe experiments using specific datasets. Therefore, no dataset split information (train/validation/test) is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe experiments requiring specific hardware. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe experiments requiring specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and proofs, not experimental evaluation. Therefore, no experimental setup details like hyperparameters or training settings are provided. |