On the Estimation Bias in Double Q-Learning

Authors: Zhizhou Ren, Guangxiang Zhu, Hao Hu, Beining Han, Jianglun Chen, Chongjie Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate our proposed method in the Atari benchmark tasks and demonstrate its significant improvement over baseline algorithms.
Researcher Affiliation Academia Zhizhou Ren1 , Guangxiang Zhu2, Hao Hu2, Beining Han2, Jianglun Chen2, Chongjie Zhang2 1Department of Computer Science, University of Illinois at Urbana-Champaign 2Institute for Interdisciplinary Information Sciences, Tsinghua University
Pseudocode No The paper describes its algorithms and methods in text and equations but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/Stilwell-Git/Doubly-Bounded-Q-Learning.
Open Datasets Yes We extensively evaluate our proposed method in the Atari benchmark tasks and demonstrate its significant improvement over baseline algorithms.
Dataset Splits No The paper mentions evaluating on Atari benchmark tasks but does not specify training, validation, or test dataset splits in the main text.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment.
Experiment Setup Yes To approximate practical scenarios, we simulate the approximation error by entity-wise Gaussian noises N(0, 0.5). Since the stochastic process induced by such noises suffers from high variance, we perform soft update Q(t+1) = (1 α)Q(t) + α(e T Q(t)) to make the visualization clear, in which α refers to learning rate in practice. We consider α = 10 2 for all experiments presented in this section.