On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL

Authors: Jinglin Chen, Aditya Modi, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study reward-free reinforcement learning (RL) under general non-linear function approximation, and establish sample efficiency and hardness results under various standard structural assumptions. On the positive side, we propose the RFOLIVE (Reward-Free OLIVE) algorithm for sample-efficient reward-free exploration under minimal structural assumptions... On the negative side, we provide a statistical hardness result... Our analyses indicate that the explorability or reachability assumptions, previously made for the latter two settings, are not necessary statistically for reward-free exploration. Remark Similar to its counterparts in reward-aware general function approximation setting (Jiang et al., 2017; Dann et al., 2018; Jin et al., 2021; Du et al., 2021), RFOLIVE is in general not computationally efficient. We leave addressing computational tractability as a future direction. Our focus is statistical efficiency.
Researcher Affiliation Collaboration Jinglin Chen Department of Computer Science University of Illinois Urbana-Champaign jinglinc@illinois.edu Aditya Modi Microsoft admodi@umich.edu Akshay Krishnamurthy Microsoft Research akshaykr@microsoft.com Nan Jiang Department of Computer Science University of Illinois Urbana-Champaign nanjiang@illinois.edu Alekh Agarwal Google Research alekhagarwal@google.com
Pseudocode Yes Algorithm 1 RFOLIVE (F, ε, δ): Reward-Free OLIVE
Open Source Code No The paper does not contain any statements about providing open-source code for the described methodology, nor does it include links to any code repositories.
Open Datasets No The paper is theoretical and does not involve experimental data. Therefore, no information about publicly available training datasets is provided.
Dataset Splits No The paper is theoretical and does not involve experimental data. Therefore, no information about training, validation, or test splits is provided.
Hardware Specification No The paper focuses on theoretical contributions and does not report on any experimental setup or specific hardware used.
Software Dependencies No The paper is theoretical and does not describe software dependencies with version numbers, as it does not report on practical implementations or experiments.
Experiment Setup No The paper is theoretical and introduces an algorithm. It does not include details on experimental setup, hyperparameters, or system-level training settings.