HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning
Authors: Ziniu Li, Yingru Li, Yushun Zhang, Tong Zhang, Zhi-Quan Luo
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present numerical experiments to validate the efficiency of the proposed method. Experiment details can be found in Appendix C. Our first experiment is on the Arcade Learning Environment (Bellemare et al., 2013)... The empirical result suggests Hyper DQN with 20M frames outperforms DQN (Mnih et al., 2015) with 200M frames in terms of the maximum human-normalized score. |
| Researcher Affiliation | Academia | 1Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, China 2Hong Kong University of Science and Technology {ziniuli, yingruli, yushunzhang}@link.cuhk.edu.cn, tongzhang@ust.hk, luozq@cuhk.edu.cn |
| Pseudocode | Yes | We outline the proposed method in Algorithm 2 in Appendix. ... Algorithm 3 Hyper Actor Critic(HAC) |
| Open Source Code | No | The paper mentions public repositories for baseline implementations but does not provide a link or explicit statement for the open-sourcing of Hyper DQN's code. |
| Open Datasets | Yes | Our first experiment is on the Arcade Learning Environment (Bellemare et al., 2013)... For another challenging benchmark Super Mario Bros (Kauten, 2018)... |
| Dataset Splits | No | The paper describes training budgets in terms of 'frames' and evaluation based on 'human-normalized score' during interaction. While it mentions test ϵ for agent behavior during evaluation, it does not provide specific training/test/validation dataset splits as would be common in supervised learning. Reinforcement learning experiments often involve continuous interaction with an environment rather than pre-split static datasets. |
| Hardware Specification | No | The paper states 'the 200M frames training budget requires about 30/10 days with a CPU/GPU machine for a game' but does not specify any particular CPU or GPU models, or other hardware details. |
| Software Dependencies | No | The paper mentions 'Py Torch' and 'tianshou' framework but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Algorithm parameters are listed in Table 4. To stabilize training, each z corresponds to 32 mini-batch samples and the effective batch size of Hyper DQN is 32 10 = 320. ... For example, the replay buffer size is 1M; the batch size is 32; the discount factor is 0.99; the target network update frequency is 10K agent steps; the train frequency is 4 agent steps; and the replay starts after 50K agent steps. For algorithms with ϵ-greedy (e.g., DQN, Boot DQN, OPIQ and OB2I), the exploration ϵ is annealed from 1.0 to 0.1 linearly (from 50K agent steps to 1M agent steps, respectively); the test ϵ is 0.05. |