On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs

Authors: Yuanzhou Chen, Jiafan He, Quanquan Gu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study reinforcement learning for infinitehorizon discounted linear kernel MDPs, where the transition probability function is linear in a predefined feature mapping. Existing UCLK (Zhou et al., 2021b) algorithm for this setting only has a regret guarantee, which cannot lead to a tight sample complexity bound. In this paper, we extend uniform-PAC sample complexity from the episodic setting to the infinite-horizon discounted setting, and propose a novel algorithm dubbed UPAC-UCLK that achieves an e O d2/((1 γ)4ϵ2) + 1/((1 γ)6ϵ2) uniform PAC sample complexity...
Researcher Affiliation Academia 1School of Mathematical Sciences, Peking University, Beijing, China 2Computer Science Department, University of California at Los Angeles, Los Angeles, California, USA.
Pseudocode Yes Algorithm 1 Uniform-PAC UCLK (UPAC-UCLK) Require: Regularization parameter λ, exploration parameters βl for l = 1, 2, . . ., number of value iteration rounds Ut for t = 1, 2, . . .. Algorithm 2 Multi-Level Extended Value Iteration (ML-EVI) Require: Number of levels L, confidence sets Cl, l = 1, . . . , L, number of value iteration rounds U.
Open Source Code No The paper focuses on theoretical contributions and algorithm design; it does not mention releasing source code for public use.
Open Datasets No The paper is theoretical and does not involve any empirical evaluation on datasets. Thus, it does not mention specific datasets or their public availability.
Dataset Splits No The paper is theoretical and focuses on mathematical proofs and algorithm design, not empirical evaluation. Therefore, it does not specify any training/validation/test dataset splits.
Hardware Specification No The paper is theoretical and does not describe any experiments or the specific hardware used to run them.
Software Dependencies No The paper is theoretical, focusing on algorithm design and mathematical proofs. It does not specify any software dependencies with version numbers required for practical implementation or experimentation.
Experiment Setup No The paper is purely theoretical, providing algorithm design and theoretical guarantees. It does not include details on experimental setup, hyperparameters, or training configurations.