reproducibilityindex.ai

On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs

Authors: Yuanzhou Chen, Jiafan He, Quanquan Gu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study reinforcement learning for infinitehorizon discounted linear kernel MDPs, where the transition probability function is linear in a predefined feature mapping. Existing UCLK (Zhou et al., 2021b) algorithm for this setting only has a regret guarantee, which cannot lead to a tight sample complexity bound. In this paper, we extend uniform-PAC sample complexity from the episodic setting to the infinite-horizon discounted setting, and propose a novel algorithm dubbed UPAC-UCLK that achieves an e O d2/((1 γ)4ϵ2) + 1/((1 γ)6ϵ2) uniform PAC sample complexity...
Researcher Affiliation	Academia	1School of Mathematical Sciences, Peking University, Beijing, China 2Computer Science Department, University of California at Los Angeles, Los Angeles, California, USA.
Pseudocode	Yes	Algorithm 1 Uniform-PAC UCLK (UPAC-UCLK) Require: Regularization parameter λ, exploration parameters βl for l = 1, 2, . . ., number of value iteration rounds Ut for t = 1, 2, . . .. Algorithm 2 Multi-Level Extended Value Iteration (ML-EVI) Require: Number of levels L, confidence sets Cl, l = 1, . . . , L, number of value iteration rounds U.
Open Source Code	No	The paper focuses on theoretical contributions and algorithm design; it does not mention releasing source code for public use.
Open Datasets	No	The paper is theoretical and does not involve any empirical evaluation on datasets. Thus, it does not mention specific datasets or their public availability.
Dataset Splits	No	The paper is theoretical and focuses on mathematical proofs and algorithm design, not empirical evaluation. Therefore, it does not specify any training/validation/test dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe any experiments or the specific hardware used to run them.
Software Dependencies	No	The paper is theoretical, focusing on algorithm design and mathematical proofs. It does not specify any software dependencies with version numbers required for practical implementation or experimentation.
Experiment Setup	No	The paper is purely theoretical, providing algorithm design and theoretical guarantees. It does not include details on experimental setup, hyperparameters, or training configurations.