reproducibilityindex.ai

On Reward-Free Reinforcement Learning with Linear Function Approximation

Authors: Ruosong Wang, Simon S. Du, Lin Yang, Russ R. Salakhutdinov

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work, we give both positive and negative results for reward-free RL with linear function approximation. We give an algorithm for reward-free RL in the linear Markov decision process setting... The sample complexity of our algorithm is polynomial in the feature dimension and the planning horizon... We further give an exponential lower bound for reward-free RL in the setting where only the optimal Q-function admits a linear representation.
Researcher Affiliation	Academia	Correspondence to: Ruosong Wang <ruosongw@andrew.cmu.edu>, Simon S. Du <ssdu@cs.washington.edu>, Lin F. Yang <linyang@ee.ucla.edu>, Ruslan Salakhutdinov <rsalakhu@cs.cmu.edu>. Carnegie Mellon University University of Washington, Seattle University of California, Los Angeles
Pseudocode	Yes	Algorithm 1 Reward-Free RL for Linear MDPs: Exploration Phase; Algorithm 2 Reward-Free RL for Linear MDPs: Planning Phase
Open Source Code	No	The paper does not provide any links to open-source code or explicitly state that code for the described methodology is being released.
Open Datasets	No	The paper is theoretical and describes algorithms and sample complexity bounds for abstract settings (e.g., linear MDPs, linear Q functions), rather than conducting empirical studies on specific named datasets.
Dataset Splits	No	The paper is theoretical and does not conduct empirical experiments on datasets, thus no training/test/validation splits are provided.
Hardware Specification	No	The paper is theoretical and does not report on empirical experiments, therefore no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and describes algorithms and proofs; it does not list specific software dependencies with version numbers for implementation or execution.
Experiment Setup	No	The paper is theoretical and presents algorithms; it does not include details on an experimental setup such as hyperparameters or training configurations for empirical evaluation.