Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs
Authors: Junkai Zhang, Weitong Zhang, Quanquan Gu
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we propose a new reward-free algorithm for learning linear mixture Markov decision processes (MDPs)... We show that our algorithm only needs to explore e O d2ε 2 episodes... In addition, we provide an Ω d2ε 2 sample complexity lower bound, which matches the sample complexity of our algorithm up to logarithmic factors, suggesting that our algorithm is optimal. |
| Researcher Affiliation | Academia | Junkai Zhang 1 Weitong Zhang 1 Quanquan Gu 1 1Department of Computer Science, University of California, Los Angeles, California, USA. Correspondence to: Quanquan Gu <qgu@cs.ucla.edu>. |
| Pseudocode | Yes | Algorithm 1 HF-UCRL-RFE++ Input: Confidence radius {βk}, regularization parameter λ, number of the high-order estimator M |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper is theoretical and does not describe the use of any dataset for training. |
| Dataset Splits | No | The paper is theoretical and does not describe any dataset splits for validation, as it does not conduct empirical experiments. |
| Hardware Specification | No | The paper is theoretical and does not describe hardware specifications for experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations. |