reproducibilityindex.ai

Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation

Authors: Jianliang He, Han Zhong, Zhuoran Yang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study inﬁnite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation. Speciﬁcally, we propose a novel algorithmic framework named Local-ﬁtted Optimization with OPtimism (LOOP), which incorporates both model-based and value-based incarnations. In particular, LOOP features a novel construction of conﬁdence sets and a low-switching policy updating scheme, which are tailored to the average-reward and function approximation setting. Moreover, for AMDPs, we propose a novel complexity measure average-reward generalized eluder coefﬁcient (AGEC) which captures the challenge of exploration in AMDPs with general function approximation. Using AGEC, we prove that LOOP achieves a sublinear O(poly(d, sp(V )) Tβ) regret, where d and β correspond to AGEC and logcovering number of the hypothesis class respectively, sp(V ) is the span of the optimal state bias function, T denotes the number of steps, and O( ) omits logarithmic factors. To the best of our knowledge, this paper presents the ﬁrst comprehensive theoretical framework capable of handling nearly all AMDPs.
Researcher Affiliation	Academia	Jianliang He Department of Statistics and Data Science Fudan University hejl@fudan.edu.cn Han Zhong Center for Data Science Peking University hanzhong@stu.pku.edu.cn Zhuoran Yang Department of Statistics and Data Science Yale University zhuoran.yang@yale.edu
Pseudocode	Yes	Algorithm 1 Local-ﬁtted Optimization with Optimism LOOP(H, G, T, δ) Algorithm 2 MLE-based Local-ﬁtted Optimization with Optimism MLE-LOOP(H, T, δ) Algorithm 3 Extended Value Iteration (EVI)
Open Source Code	No	The paper does not contain any statement or link indicating that source code for the described methodology is publicly available.
Open Datasets	No	This is a theoretical paper and does not conduct experiments on a specific dataset. Therefore, there is no mention of a publicly available dataset.
Dataset Splits	No	This is a theoretical paper and does not conduct experiments. Therefore, there are no training/test/validation dataset splits mentioned.
Hardware Specification	No	This is a theoretical paper that does not report computational experiments. Therefore, no hardware specifications are provided.
Software Dependencies	No	This is a theoretical paper that does not report computational experiments. Therefore, no software dependencies with version numbers are listed.
Experiment Setup	No	This is a theoretical paper and does not conduct experiments. Therefore, no experimental setup details like hyperparameters or training settings are provided.