Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On the Estimation Bias in Double Q-Learning
Authors: Zhizhou Ren, Guangxiang Zhu, Hao Hu, Beining Han, Jianglun Chen, Chongjie Zhang
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate our proposed method in the Atari benchmark tasks and demonstrate its significant improvement over baseline algorithms. |
| Researcher Affiliation | Academia | Zhizhou Ren1 , Guangxiang Zhu2, Hao Hu2, Beining Han2, Jianglun Chen2, Chongjie Zhang2 1Department of Computer Science, University of Illinois at Urbana-Champaign 2Institute for Interdisciplinary Information Sciences, Tsinghua University |
| Pseudocode | No | The paper describes its algorithms and methods in text and equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/Stilwell-Git/Doubly-Bounded-Q-Learning. |
| Open Datasets | Yes | We extensively evaluate our proposed method in the Atari benchmark tasks and demonstrate its significant improvement over baseline algorithms. |
| Dataset Splits | No | The paper mentions evaluating on Atari benchmark tasks but does not specify training, validation, or test dataset splits in the main text. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment. |
| Experiment Setup | Yes | To approximate practical scenarios, we simulate the approximation error by entity-wise Gaussian noises N(0, 0.5). Since the stochastic process induced by such noises suffers from high variance, we perform soft update Q(t+1) = (1 α)Q(t) + α(e T Q(t)) to make the visualization clear, in which α refers to learning rate in practice. We consider α = 10 2 for all experiments presented in this section. |