reproducibilityindex.ai

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

Authors: Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on Grid World show that the DBS operator enables better estimation of the value function, which rectiﬁes the convergence issue of the softmax operator. Finally, we propose the DBS-DQN algorithm by applying the DBS operator, which outperforms DQN substantially in 40 out of 49 Atari games.
Researcher Affiliation	Collaboration	Ling Pan1 , Qingpeng Cai2 , Qi Meng3 , Wei Chen3 , Longbo Huang1 1IIIS, Tsinghua University 2Alibaba Group 3Microsoft Research
Pseudocode	Yes	Algorithm 1 DBS Deep Q-Network
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described, nor does it explicitly state that the code is open-source or available.
Open Datasets	Yes	We ﬁrst evaluate DBS value iteration and DBS Q-learning on a tabular game, the Grid World. We then evaluate the DBS-DQN algorithm on 49 Atari video games from the Arcade Learning Environment [Bellemare et al., 2013], a standard challenging benchmark for deep reinforcement learning algorithms, by comparing it with DQN.
Dataset Splits	No	The paper uses RL environments (Grid World, Atari games) where explicit train/validation/test dataset splits (percentages or sample counts) are not typically defined as in supervised learning tasks. It describes training for 50M steps and evaluating performance through human normalized scores, but not specific dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	For fair comparison, we use the same setup of network architectures and hyper-parameters as in [Mnih et al., 2015] for both DQN and DBS-DQN.