Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Minimax Optimal Reward-free Reinforcement Learning in Linear MDPs

Authors: Pihe Hu, Yu Chen, Longbo Huang

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove an r Op H4d2{Ï”2q sample complexity upper bound for LSVI-RFE, where H is the episode length and d is the feature dimension. We also establish a sample complexity lower bound of ℩p H3d2{Ï”2q.
Researcher Affiliation Academia Pihe Hu , Yu Chen , Longbo Huang: Institute for Interdisciplinary Institute for Interdisciplinary Information Sciences Tsinghua University, Beijing, China EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Least-Squares Value Iteration RFE (LSVI-RFE): Exploration Phase
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No This is a theoretical paper and does not involve empirical evaluation on datasets.
Dataset Splits No This is a theoretical paper and does not involve empirical evaluation on datasets, thus no dataset splits are provided.
Hardware Specification No The paper discusses computational complexity in theoretical terms (O-notation) but does not specify any particular hardware used for experiments.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers.
Experiment Setup No This is a theoretical paper and does not describe an experimental setup with hyperparameters or training configurations.