Reinforcement Learning with Dynamic Boltzmann Softmax Updates
Authors: Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on Grid World show that the DBS operator enables better estimation of the value function, which rectifies the convergence issue of the softmax operator. Finally, we propose the DBS-DQN algorithm by applying the DBS operator, which outperforms DQN substantially in 40 out of 49 Atari games. |
| Researcher Affiliation | Collaboration | Ling Pan1 , Qingpeng Cai2 , Qi Meng3 , Wei Chen3 , Longbo Huang1 1IIIS, Tsinghua University 2Alibaba Group 3Microsoft Research |
| Pseudocode | Yes | Algorithm 1 DBS Deep Q-Network |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described, nor does it explicitly state that the code is open-source or available. |
| Open Datasets | Yes | We first evaluate DBS value iteration and DBS Q-learning on a tabular game, the Grid World. We then evaluate the DBS-DQN algorithm on 49 Atari video games from the Arcade Learning Environment [Bellemare et al., 2013], a standard challenging benchmark for deep reinforcement learning algorithms, by comparing it with DQN. |
| Dataset Splits | No | The paper uses RL environments (Grid World, Atari games) where explicit train/validation/test dataset splits (percentages or sample counts) are not typically defined as in supervised learning tasks. It describes training for 50M steps and evaluating performance through human normalized scores, but not specific dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | For fair comparison, we use the same setup of network architectures and hyper-parameters as in [Mnih et al., 2015] for both DQN and DBS-DQN. |