reproducibilityindex.ai

Dueling Network Architectures for Deep Reinforcement Learning

Authors: Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, Nando Freitas

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results show that this architecture leads to better policy evaluation in the presence of many similar-valued actions. Moreover, the dueling architecture enables our RL agent to outperform the state-of-the-art on the Atari 2600 domain.
Researcher Affiliation	Industry	Ziyu Wang ZIYU@GOOGLE.COM Tom Schaul SCHAUL@GOOGLE.COM Matteo Hessel MTTHSS@GOOGLE.COM Hado van Hasselt HADO@GOOGLE.COM Marc Lanctot LANCTOT@GOOGLE.COM Nando de Freitas NANDODEFREITAS@GOOGLE.COM Google Deep Mind, London, UK
Pseudocode	Yes	The pseudocode for DDQN is presented in Appendix A.
Open Source Code	No	The paper does not include an explicit statement about releasing its source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We perform a comprehensive evaluation of our proposed method on the Arcade Learning Environment (Bellemare et al., 2013), which is composed of 57 Atari games.
Dataset Splits	No	The paper mentions experience replay and evaluation metrics, but it does not specify explicit training, validation, and test dataset splits with percentages or counts for reproducibility.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions using specific algorithms and techniques like 'DDQN' and 'Prioritized Replay' but does not specify version numbers for any programming languages, libraries, or frameworks used (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup	Yes	Our network architecture has the same low-level convolutional structure of DQN (Mnih et al., 2015; van Hasselt et al., 2015). There are 3 convolutional layers followed by 2 fully-connected layers. The ﬁrst convolutional layer has 32 8 8 ﬁlters with stride 4, the second 64 4 4 ﬁlters with stride 2, and the third and ﬁnal convolutional layer 64 3 3 ﬁlters with stride 1. [...] We combine the value and advantage streams using the module described by Equation (9). Rectiﬁer non-linearities (Fukushima, 1980) are inserted between all adjacent layers. [...] we settled on 6.25 10 5 for the learning rate and 10 for the gradient clipping norm (the same as in the previous section).