Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation
Authors: Weitong ZHANG, Dongruo Zhou, Quanquan Gu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the model-based reward-free reinforcement learning with linear function approximation for episodic Markov decision processes (MDPs). We propose a new provably efficient algorithm, called UCRL-RFE under the Linear Mixture MDP assumption... We show that to obtain an ϵ-optimal policy for arbitrary reward function, UCRL-RFE needs to sample at most e O(H5d2ϵ 2) episodes during the exploration phase. Here, H is the length of the episode, d is the dimension of the feature mapping. We also propose a variant of UCRL-RFE using Bernstein-type bonus and show that it needs to sample at most e O(H4d(H + d)ϵ 2) to achieve an ϵ-optimal policy. By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least eΩ(H2dϵ 2) episodes to obtain an ϵ-optimal policy. Our upper bound matches the lower bound in terms of the dependence on ϵ and the dependence on d if H d. ... If you ran experiments... [N/A] |
| Researcher Affiliation | Academia | Weitong Zhang Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 weightzero@cs.ucla.edu Dongruo Zhou Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 drzhou@cs.ucla.edu Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 qgu@cs.ucla.edu |
| Pseudocode | Yes | Algorithm 1 UCRL-RFE Planning Module (PLAN) ... Algorithm 2 UCRL-RFE (Hoeffding Bonus) ... Algorithm 3 UCRL-RFE+ (Bernstein Bonus) |
| Open Source Code | No | The paper does not explicitly state that it provides open-source code for the described methodology or link to a code repository. The 'checks' section states '3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]'. |
| Open Datasets | No | The paper is theoretical and does not conduct empirical experiments on datasets; therefore, it does not mention public dataset availability or use specific datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments; thus, it does not describe training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not report on experimental hardware. The 'checks' section indicates 'N/A' for experiments and associated details. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and theoretical guarantees. It does not describe any specific software dependencies with version numbers required to reproduce experiments. |
| Experiment Setup | No | The paper is theoretical and does not include details about an experimental setup, such as specific hyperparameters or system-level training settings. |