reproducibilityindex.ai

On the Estimation Bias in Double Q-Learning

Authors: Zhizhou Ren, Guangxiang Zhu, Hao Hu, Beining Han, Jianglun Chen, Chongjie Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively evaluate our proposed method in the Atari benchmark tasks and demonstrate its signiﬁcant improvement over baseline algorithms.
Researcher Affiliation	Academia	Zhizhou Ren1 , Guangxiang Zhu2, Hao Hu2, Beining Han2, Jianglun Chen2, Chongjie Zhang2 1Department of Computer Science, University of Illinois at Urbana-Champaign 2Institute for Interdisciplinary Information Sciences, Tsinghua University
Pseudocode	No	The paper describes its algorithms and methods in text and equations but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/Stilwell-Git/Doubly-Bounded-Q-Learning.
Open Datasets	Yes	We extensively evaluate our proposed method in the Atari benchmark tasks and demonstrate its signiﬁcant improvement over baseline algorithms.
Dataset Splits	No	The paper mentions evaluating on Atari benchmark tasks but does not specify training, validation, or test dataset splits in the main text.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment.
Experiment Setup	Yes	To approximate practical scenarios, we simulate the approximation error by entity-wise Gaussian noises N(0, 0.5). Since the stochastic process induced by such noises suffers from high variance, we perform soft update Q(t+1) = (1 α)Q(t) + α(e T Q(t)) to make the visualization clear, in which α refers to learning rate in practice. We consider α = 10 2 for all experiments presented in this section.