reproducibilityindex.ai

Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs

Authors: Junkai Zhang, Weitong Zhang, Quanquan Gu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we propose a new reward-free algorithm for learning linear mixture Markov decision processes (MDPs)... We show that our algorithm only needs to explore e O d2ε 2 episodes... In addition, we provide an Ω d2ε 2 sample complexity lower bound, which matches the sample complexity of our algorithm up to logarithmic factors, suggesting that our algorithm is optimal.
Researcher Affiliation	Academia	Junkai Zhang 1 Weitong Zhang 1 Quanquan Gu 1 1Department of Computer Science, University of California, Los Angeles, California, USA. Correspondence to: Quanquan Gu <qgu@cs.ucla.edu>.
Pseudocode	Yes	Algorithm 1 HF-UCRL-RFE++ Input: Confidence radius {βk}, regularization parameter λ, number of the high-order estimator M
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper is theoretical and does not describe the use of any dataset for training.
Dataset Splits	No	The paper is theoretical and does not describe any dataset splits for validation, as it does not conduct empirical experiments.
Hardware Specification	No	The paper is theoretical and does not describe hardware specifications for experiments.
Software Dependencies	No	The paper is theoretical and does not describe specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations.