Towards Minimax Optimal Reward-free Reinforcement Learning in Linear MDPs

Authors: Pihe Hu, Yu Chen, Longbo Huang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove an r Op H4d2{ϵ2q sample complexity upper bound for LSVI-RFE, where H is the episode length and d is the feature dimension. We also establish a sample complexity lower bound of Ωp H3d2{ϵ2q.
Researcher Affiliation Academia Pihe Hu , Yu Chen , Longbo Huang: Institute for Interdisciplinary Institute for Interdisciplinary Information Sciences Tsinghua University, Beijing, China {hph19,c-y19}@mails.tsinghua.edu.cn, longbohuang@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 Least-Squares Value Iteration RFE (LSVI-RFE): Exploration Phase
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No This is a theoretical paper and does not involve empirical evaluation on datasets.
Dataset Splits No This is a theoretical paper and does not involve empirical evaluation on datasets, thus no dataset splits are provided.
Hardware Specification No The paper discusses computational complexity in theoretical terms (O-notation) but does not specify any particular hardware used for experiments.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers.
Experiment Setup No This is a theoretical paper and does not describe an experimental setup with hyperparameters or training configurations.