On the Estimation Bias in Double Q-Learning
Authors: Zhizhou Ren, Guangxiang Zhu, Hao Hu, Beining Han, Jianglun Chen, Chongjie Zhang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate our proposed method in the Atari benchmark tasks and demonstrate its significant improvement over baseline algorithms. |
| Researcher Affiliation | Academia | Zhizhou Ren1 , Guangxiang Zhu2, Hao Hu2, Beining Han2, Jianglun Chen2, Chongjie Zhang2 1Department of Computer Science, University of Illinois at Urbana-Champaign 2Institute for Interdisciplinary Information Sciences, Tsinghua University |
| Pseudocode | No | The paper describes its algorithms and methods in text and equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/Stilwell-Git/Doubly-Bounded-Q-Learning. |
| Open Datasets | Yes | We extensively evaluate our proposed method in the Atari benchmark tasks and demonstrate its significant improvement over baseline algorithms. |
| Dataset Splits | No | The paper mentions evaluating on Atari benchmark tasks but does not specify training, validation, or test dataset splits in the main text. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment. |
| Experiment Setup | Yes | To approximate practical scenarios, we simulate the approximation error by entity-wise Gaussian noises N(0, 0.5). Since the stochastic process induced by such noises suffers from high variance, we perform soft update Q(t+1) = (1 α)Q(t) + α(e T Q(t)) to make the visualization clear, in which α refers to learning rate in practice. We consider α = 10 2 for all experiments presented in this section. |