Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs
Authors: Yuan Cheng, Ruiquan Huang, Yingbin Liang, Jing Yang
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we first provide the first known sample complexity lower bound that holds for any algorithm under low-rank MDPs. We then propose a novel model-based algorithm, coined RAFFLE, and show it can both find an ϵ-optimal policy and achieve an ϵ-accurate system identification via reward-free exploration, with a sample complexity significantly improving the previous results. Such a sample complexity matches our lower bound in the dependence on ϵ, as well as on K in the large d regime, where d and K respectively denote the representation dimension and action space cardinality. |
| Researcher Affiliation | Academia | Yuan Cheng University of Science and Technology of China EMAIL Ruiquan Huang The Pennsylvania State University EMAIL Jing Yang The Pennsylvania State University EMAIL Yingbin Liang The Ohio State University EMAIL |
| Pseudocode | Yes | Algorithm 1 RAFFLE (Rew Ard-Free Feature LEarning) |
| Open Source Code | No | The paper does not include any explicit statement about providing open-source code or a link to a code repository for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on datasets, thus it does not mention the public availability or specific access information for any dataset. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments with datasets, therefore it does not provide details on dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not involve empirical experiments; therefore, it does not provide any hardware specifications for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not involve empirical experiments; therefore, it does not list specific software dependencies with version numbers for replication. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and theoretical sample complexity; it does not describe an empirical experimental setup with specific hyperparameters or training configurations. |