Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs
Authors: Junkai Zhang, Weitong Zhang, Quanquan Gu
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we propose a new reward-free algorithm for learning linear mixture Markov decision processes (MDPs)... We show that our algorithm only needs to explore e O d2ε 2 episodes... In addition, we provide an Ω d2ε 2 sample complexity lower bound, which matches the sample complexity of our algorithm up to logarithmic factors, suggesting that our algorithm is optimal. |
| Researcher Affiliation | Academia | Junkai Zhang 1 Weitong Zhang 1 Quanquan Gu 1 1Department of Computer Science, University of California, Los Angeles, California, USA. Correspondence to: Quanquan Gu <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 HF-UCRL-RFE++ Input: Confidence radius {βk}, regularization parameter λ, number of the high-order estimator M |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper is theoretical and does not describe the use of any dataset for training. |
| Dataset Splits | No | The paper is theoretical and does not describe any dataset splits for validation, as it does not conduct empirical experiments. |
| Hardware Specification | No | The paper is theoretical and does not describe hardware specifications for experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations. |