Dueling Network Architectures for Deep Reinforcement Learning
Authors: Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, Nando Freitas
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results show that this architecture leads to better policy evaluation in the presence of many similar-valued actions. Moreover, the dueling architecture enables our RL agent to outperform the state-of-the-art on the Atari 2600 domain. |
| Researcher Affiliation | Industry | Ziyu Wang ZIYU@GOOGLE.COM Tom Schaul SCHAUL@GOOGLE.COM Matteo Hessel MTTHSS@GOOGLE.COM Hado van Hasselt HADO@GOOGLE.COM Marc Lanctot LANCTOT@GOOGLE.COM Nando de Freitas NANDODEFREITAS@GOOGLE.COM Google Deep Mind, London, UK |
| Pseudocode | Yes | The pseudocode for DDQN is presented in Appendix A. |
| Open Source Code | No | The paper does not include an explicit statement about releasing its source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We perform a comprehensive evaluation of our proposed method on the Arcade Learning Environment (Bellemare et al., 2013), which is composed of 57 Atari games. |
| Dataset Splits | No | The paper mentions experience replay and evaluation metrics, but it does not specify explicit training, validation, and test dataset splits with percentages or counts for reproducibility. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper mentions using specific algorithms and techniques like 'DDQN' and 'Prioritized Replay' but does not specify version numbers for any programming languages, libraries, or frameworks used (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | Our network architecture has the same low-level convolutional structure of DQN (Mnih et al., 2015; van Hasselt et al., 2015). There are 3 convolutional layers followed by 2 fully-connected layers. The first convolutional layer has 32 8 8 filters with stride 4, the second 64 4 4 filters with stride 2, and the third and final convolutional layer 64 3 3 filters with stride 1. [...] We combine the value and advantage streams using the module described by Equation (9). Rectifier non-linearities (Fukushima, 1980) are inserted between all adjacent layers. [...] we settled on 6.25 10 5 for the learning rate and 10 for the gradient clipping norm (the same as in the previous section). |