reproducibilityindex.ai

Agent57: Outperforming the Atari Human Benchmark

Authors: Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Zhaohan Daniel Guo, Charles Blundell

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose Agent57, the ﬁrst deep RL agent that outperforms the standard human benchmark on all 57 Atari games. To achieve this result, we train a neural network which parameterizes a family of policies ranging from very exploratory to purely exploitative. We propose an adaptive mechanism to choose which policy to prioritize throughout the training process. Additionally, we utilize a novel parameterization of the architecture that allows for more consistent and stable learning.
Researcher Affiliation	Industry	1Deep Mind. Correspondence to: Adrià Puigdomènech Badia <adriap@google.com>.
Pseudocode	No	The paper describes algorithms and methods in prose and figures, but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code, nor does it state that code is released.
Open Datasets	Yes	The Arcade Learning Environment (ALE; Bellemare et al., 2013) was proposed as a platform for empirically assessing agents designed for general competency across a wide range of games. ALE offers an interface to a diverse set of Atari 2600 game environments designed to be engaging and challenging for human players.
Dataset Splits	No	The paper mentions evaluating over 3 seeds and a windowed mean over 50 episodes, but does not provide specific training, validation, and test dataset splits in terms of percentages or sample counts for reproduction. It refers to a 'separate evaluator process' but not a distinct validation split.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies	No	The paper mentions various algorithms and agents (DQN, Mu Zero, R2D2, NGU) but does not provide specific software names with version numbers for reproducibility.
Experiment Setup	Yes	Following NGU, Agent57 uses a family of coefﬁcients {(βj, γj)}N 1 j=0 of size N = 32. The choice of discounts {γj}N 1 j=0 differs from that of NGU to allow for higher values, ranging from 0.99 to 0.9999 (see App. G.1 for details). The meta-controller uses a window size of τ = 160 episodes and ϵ = 0.5 for the actors and a window size of τ = 3600 episodes and ϵ = 0.01. All the other hyperparameters are identical to those of NGU, including the standard preprocessing of Atari frames. For a complete description of the hyperparameters and preprocessing we use, please see App. G.3.