Finite-Time Analysis for Double Q-learning
Authors: Huaqing Xiong, Lin Zhao, Yingbin Liang, Wei Zhang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we provide the first non-asymptotic (i.e., finite-time) analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an ϵ-accurate neighborhood of the global optimum by taking Ω 1 (1 γ)6ϵ2 1 ω + 1 1 γ 1 1 ω iterations, where ω (0, 1) is the decay parameter of the learning rate, and γ is the discount factor. Our analysis develops novel techniques to derive finite-time bounds on the difference between two inter-connected stochastic processes, which is new to the literature of stochastic approximation. |
| Researcher Affiliation | Academia | 1The Ohio State University 2National University of Singapore 3Southern University of Science and Technology 4Peng Cheng Laboratory 1{xiong.309, liang.889}@osu.edu; 2elezhli@nus.edu.sg; 3,4zhangw3@sustech.edu.cn |
| Pseudocode | Yes | Algorithm 1 Synchronous Double Q-learning (Hasselt, 2010) |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not discuss using or providing access to any dataset for training. |
| Dataset Splits | No | The paper is theoretical and does not discuss dataset splits for validation. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not provide details about an experimental setup, hyperparameters, or training configurations. |