Asynchronous Methods for Deep Reinforcement Learning

Authors: Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform most of our experiments using the Arcade Learning Environment (Bellemare et al., 2012), which provides a simulator for Atari 2600 games. This is one of the most commonly used benchmark environments for RL algorithms. We use the Atari domain to compare against state of the art results (Van Hasselt et al., 2015; Wang et al., 2015; Schaul et al., 2015; Nair et al., 2015; Mnih et al., 2015), as well as to carry out a detailed stability and scalability analysis of the proposed methods.
Researcher Affiliation Collaboration 1 Google Deep Mind 2 Montreal Institute for Learning Algorithms (MILA), University of Montreal
Pseudocode Yes Algorithm 1 Asynchronous one-step Q-learning pseudocode for each actor-learner thread.
Open Source Code No The paper references videos demonstrating learned behaviors (e.g., 'A video showing the learned driving behavior of the A3C agent can be found at https://youtu.be/0xo1Ldx3L5Q.') but does not provide an explicit statement or link for the source code of their methodology.
Open Datasets Yes We perform most of our experiments using the Arcade Learning Environment (Bellemare et al., 2012), which provides a simulator for Atari 2600 games.
Dataset Splits No The paper mentions tuning hyperparameters using a search on six Atari games, implying a validation process, but does not explicitly state specific training, validation, and test dataset splits with percentages or counts for reproducibility beyond mentioning existing protocols.
Hardware Specification Yes Figure 1 compares the learning speed of the DQN algorithm trained on an Nvidia K40 GPU with the asynchronous methods trained using 16 CPU cores on five Atari 2600 games.
Software Dependencies No The paper describes algorithms and frameworks (e.g., RMSProp) but does not provide specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes Specifically, we tuned hyperparameters (learning rate and amount of gradient norm clipping) using a search on six Atari games (Beamrider, Breakout, Pong, Q*bert, Seaquest and Space Invaders) and then fixed all hyperparameters for all 57 games. We trained both a feedforward agent with the same architecture as (Mnih et al., 2015; Nair et al., 2015; Van Hasselt et al., 2015) as well as a recurrent agent with an additional 256 LSTM cells after the final hidden layer.