Towards Minimax Optimal Reward-free Reinforcement Learning in Linear MDPs
Authors: Pihe Hu, Yu Chen, Longbo Huang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove an r Op H4d2{ϵ2q sample complexity upper bound for LSVI-RFE, where H is the episode length and d is the feature dimension. We also establish a sample complexity lower bound of Ωp H3d2{ϵ2q. |
| Researcher Affiliation | Academia | Pihe Hu , Yu Chen , Longbo Huang: Institute for Interdisciplinary Institute for Interdisciplinary Information Sciences Tsinghua University, Beijing, China {hph19,c-y19}@mails.tsinghua.edu.cn, longbohuang@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1 Least-Squares Value Iteration RFE (LSVI-RFE): Exploration Phase |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | This is a theoretical paper and does not involve empirical evaluation on datasets. |
| Dataset Splits | No | This is a theoretical paper and does not involve empirical evaluation on datasets, thus no dataset splits are provided. |
| Hardware Specification | No | The paper discusses computational complexity in theoretical terms (O-notation) but does not specify any particular hardware used for experiments. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers. |
| Experiment Setup | No | This is a theoretical paper and does not describe an experimental setup with hyperparameters or training configurations. |