On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs
Authors: Yuanzhou Chen, Jiafan He, Quanquan Gu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study reinforcement learning for infinitehorizon discounted linear kernel MDPs, where the transition probability function is linear in a predefined feature mapping. Existing UCLK (Zhou et al., 2021b) algorithm for this setting only has a regret guarantee, which cannot lead to a tight sample complexity bound. In this paper, we extend uniform-PAC sample complexity from the episodic setting to the infinite-horizon discounted setting, and propose a novel algorithm dubbed UPAC-UCLK that achieves an e O d2/((1 γ)4ϵ2) + 1/((1 γ)6ϵ2) uniform PAC sample complexity... |
| Researcher Affiliation | Academia | 1School of Mathematical Sciences, Peking University, Beijing, China 2Computer Science Department, University of California at Los Angeles, Los Angeles, California, USA. |
| Pseudocode | Yes | Algorithm 1 Uniform-PAC UCLK (UPAC-UCLK) Require: Regularization parameter λ, exploration parameters βl for l = 1, 2, . . ., number of value iteration rounds Ut for t = 1, 2, . . .. Algorithm 2 Multi-Level Extended Value Iteration (ML-EVI) Require: Number of levels L, confidence sets Cl, l = 1, . . . , L, number of value iteration rounds U. |
| Open Source Code | No | The paper focuses on theoretical contributions and algorithm design; it does not mention releasing source code for public use. |
| Open Datasets | No | The paper is theoretical and does not involve any empirical evaluation on datasets. Thus, it does not mention specific datasets or their public availability. |
| Dataset Splits | No | The paper is theoretical and focuses on mathematical proofs and algorithm design, not empirical evaluation. Therefore, it does not specify any training/validation/test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments or the specific hardware used to run them. |
| Software Dependencies | No | The paper is theoretical, focusing on algorithm design and mathematical proofs. It does not specify any software dependencies with version numbers required for practical implementation or experimentation. |
| Experiment Setup | No | The paper is purely theoretical, providing algorithm design and theoretical guarantees. It does not include details on experimental setup, hyperparameters, or training configurations. |