Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations
Authors: Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael Jordan
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We establish both polynomial runtime complexity and polynomial sample complexity for this algorithm, without additional assumptions on the data-generating model. In particular, we prove that the algorithm incurs an e O(δFH2p T) regret, where δF characterizes the intrinsic complexity of the function class F, H is the length of each episode, and T is the total number of episodes. Our regret bounds are independent of the number of states, a result which exhibits clearly the benefit of function approximation in RL. To the best of our knowledge, this is the first provably efficient framework for reinforcement learning with kernel and neural network function approximations. |
| Researcher Affiliation | Academia | Zhuoran Yang Princeton University EMAIL Chi Jin Princeton University EMAIL Zhaoran Wang Northwestern University EMAIL Mengdi Wang Princeton University EMAIL Michael I. Jordan University of California, Berkeley EMAIL |
| Pseudocode | Yes | Algorithm 1 Optimistic Least-Squares Value Iteration with Function Approximation |
| Open Source Code | No | The paper does not provide any statements about open-source code for the described methodology or links to code repositories. |
| Open Datasets | No | This is a theoretical paper and does not describe experiments using specific datasets, nor does it provide information about dataset availability or access. |
| Dataset Splits | No | This is a theoretical paper and does not describe experiments using specific datasets. Therefore, no dataset split information (train/validation/test) is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe experiments requiring specific hardware. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe experiments requiring specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and proofs, not experimental evaluation. Therefore, no experimental setup details like hyperparameters or training settings are provided. |