Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs

Authors: Yuan Cheng, Ruiquan Huang, Yingbin Liang, Jing Yang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we first provide the first known sample complexity lower bound that holds for any algorithm under low-rank MDPs. We then propose a novel model-based algorithm, coined RAFFLE, and show it can both find an ϵ-optimal policy and achieve an ϵ-accurate system identification via reward-free exploration, with a sample complexity significantly improving the previous results. Such a sample complexity matches our lower bound in the dependence on ϵ, as well as on K in the large d regime, where d and K respectively denote the representation dimension and action space cardinality.
Researcher Affiliation Academia Yuan Cheng University of Science and Technology of China cy16@mail.ustc.edu.cn Ruiquan Huang The Pennsylvania State University rzh5514@psu.edu Jing Yang The Pennsylvania State University yangjing@psu.edu Yingbin Liang The Ohio State University liang.889@osu.edu
Pseudocode Yes Algorithm 1 RAFFLE (Rew Ard-Free Feature LEarning)
Open Source Code No The paper does not include any explicit statement about providing open-source code or a link to a code repository for the described methodology.
Open Datasets No The paper is theoretical and does not conduct experiments on datasets, thus it does not mention the public availability or specific access information for any dataset.
Dataset Splits No The paper is theoretical and does not conduct experiments with datasets, therefore it does not provide details on dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not involve empirical experiments; therefore, it does not provide any hardware specifications for running experiments.
Software Dependencies No The paper is theoretical and does not involve empirical experiments; therefore, it does not list specific software dependencies with version numbers for replication.
Experiment Setup No The paper is theoretical and focuses on algorithm design and theoretical sample complexity; it does not describe an empirical experimental setup with specific hyperparameters or training configurations.