Agent57: Outperforming the Atari Human Benchmark

Authors: Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Zhaohan Daniel Guo, Charles Blundell

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games. To achieve this result, we train a neural network which parameterizes a family of policies ranging from very exploratory to purely exploitative. We propose an adaptive mechanism to choose which policy to prioritize throughout the training process. Additionally, we utilize a novel parameterization of the architecture that allows for more consistent and stable learning.
Researcher Affiliation Industry 1Deep Mind. Correspondence to: Adrià Puigdomènech Badia <adriap@google.com>.
Pseudocode No The paper describes algorithms and methods in prose and figures, but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code, nor does it state that code is released.
Open Datasets Yes The Arcade Learning Environment (ALE; Bellemare et al., 2013) was proposed as a platform for empirically assessing agents designed for general competency across a wide range of games. ALE offers an interface to a diverse set of Atari 2600 game environments designed to be engaging and challenging for human players.
Dataset Splits No The paper mentions evaluating over 3 seeds and a windowed mean over 50 episodes, but does not provide specific training, validation, and test dataset splits in terms of percentages or sample counts for reproduction. It refers to a 'separate evaluator process' but not a distinct validation split.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies No The paper mentions various algorithms and agents (DQN, Mu Zero, R2D2, NGU) but does not provide specific software names with version numbers for reproducibility.
Experiment Setup Yes Following NGU, Agent57 uses a family of coefficients {(βj, γj)}N 1 j=0 of size N = 32. The choice of discounts {γj}N 1 j=0 differs from that of NGU to allow for higher values, ranging from 0.99 to 0.9999 (see App. G.1 for details). The meta-controller uses a window size of τ = 160 episodes and ϵ = 0.5 for the actors and a window size of τ = 3600 episodes and ϵ = 0.01. All the other hyperparameters are identical to those of NGU, including the standard preprocessing of Atari frames. For a complete description of the hyperparameters and preprocessing we use, please see App. G.3.