reproducibilityindex.ai

Asynchronous Methods for Deep Reinforcement Learning

Authors: Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform most of our experiments using the Arcade Learning Environment (Bellemare et al., 2012), which provides a simulator for Atari 2600 games. This is one of the most commonly used benchmark environments for RL algorithms. We use the Atari domain to compare against state of the art results (Van Hasselt et al., 2015; Wang et al., 2015; Schaul et al., 2015; Nair et al., 2015; Mnih et al., 2015), as well as to carry out a detailed stability and scalability analysis of the proposed methods.
Researcher Affiliation	Collaboration	1 Google Deep Mind 2 Montreal Institute for Learning Algorithms (MILA), University of Montreal
Pseudocode	Yes	Algorithm 1 Asynchronous one-step Q-learning pseudocode for each actor-learner thread.
Open Source Code	No	The paper references videos demonstrating learned behaviors (e.g., 'A video showing the learned driving behavior of the A3C agent can be found at https://youtu.be/0xo1Ldx3L5Q.') but does not provide an explicit statement or link for the source code of their methodology.
Open Datasets	Yes	We perform most of our experiments using the Arcade Learning Environment (Bellemare et al., 2012), which provides a simulator for Atari 2600 games.
Dataset Splits	No	The paper mentions tuning hyperparameters using a search on six Atari games, implying a validation process, but does not explicitly state specific training, validation, and test dataset splits with percentages or counts for reproducibility beyond mentioning existing protocols.
Hardware Specification	Yes	Figure 1 compares the learning speed of the DQN algorithm trained on an Nvidia K40 GPU with the asynchronous methods trained using 16 CPU cores on ﬁve Atari 2600 games.
Software Dependencies	No	The paper describes algorithms and frameworks (e.g., RMSProp) but does not provide specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup	Yes	Speciﬁcally, we tuned hyperparameters (learning rate and amount of gradient norm clipping) using a search on six Atari games (Beamrider, Breakout, Pong, Q*bert, Seaquest and Space Invaders) and then ﬁxed all hyperparameters for all 57 games. We trained both a feedforward agent with the same architecture as (Mnih et al., 2015; Nair et al., 2015; Van Hasselt et al., 2015) as well as a recurrent agent with an additional 256 LSTM cells after the ﬁnal hidden layer.