reproducibilityindex.ai

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Authors: Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael Jordan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We establish both polynomial runtime complexity and polynomial sample complexity for this algorithm, without additional assumptions on the data-generating model. In particular, we prove that the algorithm incurs an e O(δFH2p T) regret, where δF characterizes the intrinsic complexity of the function class F, H is the length of each episode, and T is the total number of episodes. Our regret bounds are independent of the number of states, a result which exhibits clearly the beneﬁt of function approximation in RL. To the best of our knowledge, this is the ﬁrst provably efﬁcient framework for reinforcement learning with kernel and neural network function approximations.
Researcher Affiliation	Academia	Zhuoran Yang Princeton University zy6@princeton.edu Chi Jin Princeton University chij@princeton.edu Zhaoran Wang Northwestern University zhaoran.wang@northwestern.edu Mengdi Wang Princeton University mengdiw@princeton.edu Michael I. Jordan University of California, Berkeley jordan@cs.berkeley.edu
Pseudocode	Yes	Algorithm 1 Optimistic Least-Squares Value Iteration with Function Approximation
Open Source Code	No	The paper does not provide any statements about open-source code for the described methodology or links to code repositories.
Open Datasets	No	This is a theoretical paper and does not describe experiments using specific datasets, nor does it provide information about dataset availability or access.
Dataset Splits	No	This is a theoretical paper and does not describe experiments using specific datasets. Therefore, no dataset split information (train/validation/test) is provided.
Hardware Specification	No	The paper is theoretical and does not describe experiments requiring specific hardware. Therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe experiments requiring specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and proofs, not experimental evaluation. Therefore, no experimental setup details like hyperparameters or training settings are provided.