Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Finite-Time Analysis for Double Q-learning
Authors: Huaqing Xiong, Lin Zhao, Yingbin Liang, Wei Zhang
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we provide the first non-asymptotic (i.e., finite-time) analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an ϵ-accurate neighborhood of the global optimum by taking Ω 1 (1 γ)6ϵ2 1 ω + 1 1 γ 1 1 ω iterations, where ω (0, 1) is the decay parameter of the learning rate, and γ is the discount factor. Our analysis develops novel techniques to derive finite-time bounds on the difference between two inter-connected stochastic processes, which is new to the literature of stochastic approximation. |
| Researcher Affiliation | Academia | 1The Ohio State University 2National University of Singapore 3Southern University of Science and Technology 4Peng Cheng Laboratory 1EMAIL; EMAIL; 3,EMAIL |
| Pseudocode | Yes | Algorithm 1 Synchronous Double Q-learning (Hasselt, 2010) |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not discuss using or providing access to any dataset for training. |
| Dataset Splits | No | The paper is theoretical and does not discuss dataset splits for validation. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not provide details about an experimental setup, hyperparameters, or training configurations. |