reproducibilityindex.ai

Towards Minimax Optimal Reward-free Reinforcement Learning in Linear MDPs

Authors: Pihe Hu, Yu Chen, Longbo Huang

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We prove an r Op H4d2{ϵ2q sample complexity upper bound for LSVI-RFE, where H is the episode length and d is the feature dimension. We also establish a sample complexity lower bound of Ωp H3d2{ϵ2q.
Researcher Affiliation	Academia	Pihe Hu , Yu Chen , Longbo Huang: Institute for Interdisciplinary Institute for Interdisciplinary Information Sciences Tsinghua University, Beijing, China {hph19,c-y19}@mails.tsinghua.edu.cn, longbohuang@tsinghua.edu.cn
Pseudocode	Yes	Algorithm 1 Least-Squares Value Iteration RFE (LSVI-RFE): Exploration Phase
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	No	This is a theoretical paper and does not involve empirical evaluation on datasets.
Dataset Splits	No	This is a theoretical paper and does not involve empirical evaluation on datasets, thus no dataset splits are provided.
Hardware Specification	No	The paper discusses computational complexity in theoretical terms (O-notation) but does not specify any particular hardware used for experiments.
Software Dependencies	No	The paper does not mention any specific software dependencies with version numbers.
Experiment Setup	No	This is a theoretical paper and does not describe an experimental setup with hyperparameters or training configurations.