Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs

Authors: Yuan Cheng, Ruiquan Huang, Yingbin Liang, Jing Yang

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we first provide the first known sample complexity lower bound that holds for any algorithm under low-rank MDPs. We then propose a novel model-based algorithm, coined RAFFLE, and show it can both find an ϵ-optimal policy and achieve an ϵ-accurate system identification via reward-free exploration, with a sample complexity significantly improving the previous results. Such a sample complexity matches our lower bound in the dependence on ϵ, as well as on K in the large d regime, where d and K respectively denote the representation dimension and action space cardinality.
Researcher Affiliation Academia Yuan Cheng University of Science and Technology of China EMAIL Ruiquan Huang The Pennsylvania State University EMAIL Jing Yang The Pennsylvania State University EMAIL Yingbin Liang The Ohio State University EMAIL
Pseudocode Yes Algorithm 1 RAFFLE (Rew Ard-Free Feature LEarning)
Open Source Code No The paper does not include any explicit statement about providing open-source code or a link to a code repository for the described methodology.
Open Datasets No The paper is theoretical and does not conduct experiments on datasets, thus it does not mention the public availability or specific access information for any dataset.
Dataset Splits No The paper is theoretical and does not conduct experiments with datasets, therefore it does not provide details on dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not involve empirical experiments; therefore, it does not provide any hardware specifications for running experiments.
Software Dependencies No The paper is theoretical and does not involve empirical experiments; therefore, it does not list specific software dependencies with version numbers for replication.
Experiment Setup No The paper is theoretical and focuses on algorithm design and theoretical sample complexity; it does not describe an empirical experimental setup with specific hyperparameters or training configurations.