reproducibilityindex.ai

Finite-Time Analysis for Double Q-learning

Authors: Huaqing Xiong, Lin Zhao, Yingbin Liang, Wei Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we provide the ﬁrst non-asymptotic (i.e., ﬁnite-time) analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an ϵ-accurate neighborhood of the global optimum by taking Ω 1 (1 γ)6ϵ2 1 ω + 1 1 γ 1 1 ω iterations, where ω (0, 1) is the decay parameter of the learning rate, and γ is the discount factor. Our analysis develops novel techniques to derive ﬁnite-time bounds on the difference between two inter-connected stochastic processes, which is new to the literature of stochastic approximation.
Researcher Affiliation	Academia	1The Ohio State University 2National University of Singapore 3Southern University of Science and Technology 4Peng Cheng Laboratory 1{xiong.309, liang.889}@osu.edu; 2elezhli@nus.edu.sg; 3,4zhangw3@sustech.edu.cn
Pseudocode	Yes	Algorithm 1 Synchronous Double Q-learning (Hasselt, 2010)
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	No	The paper is theoretical and does not discuss using or providing access to any dataset for training.
Dataset Splits	No	The paper is theoretical and does not discuss dataset splits for validation.
Hardware Specification	No	The paper is theoretical and does not mention any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not specify any software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not provide details about an experimental setup, hyperparameters, or training configurations.