Agent57: Outperforming the Atari Human Benchmark
Authors: Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Zhaohan Daniel Guo, Charles Blundell
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games. To achieve this result, we train a neural network which parameterizes a family of policies ranging from very exploratory to purely exploitative. We propose an adaptive mechanism to choose which policy to prioritize throughout the training process. Additionally, we utilize a novel parameterization of the architecture that allows for more consistent and stable learning. |
| Researcher Affiliation | Industry | 1Deep Mind. Correspondence to: Adrià Puigdomènech Badia <adriap@google.com>. |
| Pseudocode | No | The paper describes algorithms and methods in prose and figures, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code, nor does it state that code is released. |
| Open Datasets | Yes | The Arcade Learning Environment (ALE; Bellemare et al., 2013) was proposed as a platform for empirically assessing agents designed for general competency across a wide range of games. ALE offers an interface to a diverse set of Atari 2600 game environments designed to be engaging and challenging for human players. |
| Dataset Splits | No | The paper mentions evaluating over 3 seeds and a windowed mean over 50 episodes, but does not provide specific training, validation, and test dataset splits in terms of percentages or sample counts for reproduction. It refers to a 'separate evaluator process' but not a distinct validation split. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. |
| Software Dependencies | No | The paper mentions various algorithms and agents (DQN, Mu Zero, R2D2, NGU) but does not provide specific software names with version numbers for reproducibility. |
| Experiment Setup | Yes | Following NGU, Agent57 uses a family of coefficients {(βj, γj)}N 1 j=0 of size N = 32. The choice of discounts {γj}N 1 j=0 differs from that of NGU to allow for higher values, ranging from 0.99 to 0.9999 (see App. G.1 for details). The meta-controller uses a window size of τ = 160 episodes and ϵ = 0.5 for the actors and a window size of τ = 3600 episodes and ϵ = 0.01. All the other hyperparameters are identical to those of NGU, including the standard preprocessing of Atari frames. For a complete description of the hyperparameters and preprocessing we use, please see App. G.3. |