On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL
Authors: Jinglin Chen, Aditya Modi, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study reward-free reinforcement learning (RL) under general non-linear function approximation, and establish sample efficiency and hardness results under various standard structural assumptions. On the positive side, we propose the RFOLIVE (Reward-Free OLIVE) algorithm for sample-efficient reward-free exploration under minimal structural assumptions... On the negative side, we provide a statistical hardness result... Our analyses indicate that the explorability or reachability assumptions, previously made for the latter two settings, are not necessary statistically for reward-free exploration. Remark Similar to its counterparts in reward-aware general function approximation setting (Jiang et al., 2017; Dann et al., 2018; Jin et al., 2021; Du et al., 2021), RFOLIVE is in general not computationally efficient. We leave addressing computational tractability as a future direction. Our focus is statistical efficiency. |
| Researcher Affiliation | Collaboration | Jinglin Chen Department of Computer Science University of Illinois Urbana-Champaign jinglinc@illinois.edu Aditya Modi Microsoft admodi@umich.edu Akshay Krishnamurthy Microsoft Research akshaykr@microsoft.com Nan Jiang Department of Computer Science University of Illinois Urbana-Champaign nanjiang@illinois.edu Alekh Agarwal Google Research alekhagarwal@google.com |
| Pseudocode | Yes | Algorithm 1 RFOLIVE (F, ε, δ): Reward-Free OLIVE |
| Open Source Code | No | The paper does not contain any statements about providing open-source code for the described methodology, nor does it include links to any code repositories. |
| Open Datasets | No | The paper is theoretical and does not involve experimental data. Therefore, no information about publicly available training datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not involve experimental data. Therefore, no information about training, validation, or test splits is provided. |
| Hardware Specification | No | The paper focuses on theoretical contributions and does not report on any experimental setup or specific hardware used. |
| Software Dependencies | No | The paper is theoretical and does not describe software dependencies with version numbers, as it does not report on practical implementations or experiments. |
| Experiment Setup | No | The paper is theoretical and introduces an algorithm. It does not include details on experimental setup, hyperparameters, or system-level training settings. |