Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs
Authors: Yuan Cheng, Ruiquan Huang, Yingbin Liang, Jing Yang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we first provide the first known sample complexity lower bound that holds for any algorithm under low-rank MDPs. We then propose a novel model-based algorithm, coined RAFFLE, and show it can both find an ϵ-optimal policy and achieve an ϵ-accurate system identification via reward-free exploration, with a sample complexity significantly improving the previous results. Such a sample complexity matches our lower bound in the dependence on ϵ, as well as on K in the large d regime, where d and K respectively denote the representation dimension and action space cardinality. |
| Researcher Affiliation | Academia | Yuan Cheng University of Science and Technology of China cy16@mail.ustc.edu.cn Ruiquan Huang The Pennsylvania State University rzh5514@psu.edu Jing Yang The Pennsylvania State University yangjing@psu.edu Yingbin Liang The Ohio State University liang.889@osu.edu |
| Pseudocode | Yes | Algorithm 1 RAFFLE (Rew Ard-Free Feature LEarning) |
| Open Source Code | No | The paper does not include any explicit statement about providing open-source code or a link to a code repository for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on datasets, thus it does not mention the public availability or specific access information for any dataset. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments with datasets, therefore it does not provide details on dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not involve empirical experiments; therefore, it does not provide any hardware specifications for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not involve empirical experiments; therefore, it does not list specific software dependencies with version numbers for replication. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and theoretical sample complexity; it does not describe an empirical experimental setup with specific hyperparameters or training configurations. |