Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs

Authors: Junkai Zhang, Weitong Zhang, Quanquan Gu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we propose a new reward-free algorithm for learning linear mixture Markov decision processes (MDPs)... We show that our algorithm only needs to explore e O d2ε 2 episodes... In addition, we provide an Ω d2ε 2 sample complexity lower bound, which matches the sample complexity of our algorithm up to logarithmic factors, suggesting that our algorithm is optimal.
Researcher Affiliation Academia Junkai Zhang 1 Weitong Zhang 1 Quanquan Gu 1 1Department of Computer Science, University of California, Los Angeles, California, USA. Correspondence to: Quanquan Gu <qgu@cs.ucla.edu>.
Pseudocode Yes Algorithm 1 HF-UCRL-RFE++ Input: Confidence radius {βk}, regularization parameter λ, number of the high-order estimator M
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper is theoretical and does not describe the use of any dataset for training.
Dataset Splits No The paper is theoretical and does not describe any dataset splits for validation, as it does not conduct empirical experiments.
Hardware Specification No The paper is theoretical and does not describe hardware specifications for experiments.
Software Dependencies No The paper is theoretical and does not describe specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations.