On Reward-Free Reinforcement Learning with Linear Function Approximation
Authors: Ruosong Wang, Simon S. Du, Lin Yang, Russ R. Salakhutdinov
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we give both positive and negative results for reward-free RL with linear function approximation. We give an algorithm for reward-free RL in the linear Markov decision process setting... The sample complexity of our algorithm is polynomial in the feature dimension and the planning horizon... We further give an exponential lower bound for reward-free RL in the setting where only the optimal Q-function admits a linear representation. |
| Researcher Affiliation | Academia | Correspondence to: Ruosong Wang <ruosongw@andrew.cmu.edu>, Simon S. Du <ssdu@cs.washington.edu>, Lin F. Yang <linyang@ee.ucla.edu>, Ruslan Salakhutdinov <rsalakhu@cs.cmu.edu>. Carnegie Mellon University University of Washington, Seattle University of California, Los Angeles |
| Pseudocode | Yes | Algorithm 1 Reward-Free RL for Linear MDPs: Exploration Phase; Algorithm 2 Reward-Free RL for Linear MDPs: Planning Phase |
| Open Source Code | No | The paper does not provide any links to open-source code or explicitly state that code for the described methodology is being released. |
| Open Datasets | No | The paper is theoretical and describes algorithms and sample complexity bounds for abstract settings (e.g., linear MDPs, linear Q functions), rather than conducting empirical studies on specific named datasets. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments on datasets, thus no training/test/validation splits are provided. |
| Hardware Specification | No | The paper is theoretical and does not report on empirical experiments, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and describes algorithms and proofs; it does not list specific software dependencies with version numbers for implementation or execution. |
| Experiment Setup | No | The paper is theoretical and presents algorithms; it does not include details on an experimental setup such as hyperparameters or training configurations for empirical evaluation. |