Bigger, Better, Faster: Human-level Atari with human-level efficiency
Authors: Max Schwarzer, Johan Samir Obando Ceron, Aaron Courville, Marc G Bellemare, Rishabh Agarwal, Pablo Samuel Castro
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available. |
| Researcher Affiliation | Collaboration | 1Google Deep Mind 2Mila 3Universit e de Montr eal. Correspondence to: Max Schwarzer <Max A.Schwarzer@gmail.com>, Johan Obando Ceron <jobando0730@gmail.com>. |
| Pseudocode | No | The paper describes the algorithms and components in prose, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We make our code and data publicly available. |
| Open Datasets | Yes | Mnih et al. (2015a) introduced the agent DQN by combining temporal-difference learning with deep networks, and demonstrated its capabilities in achieving human-level performance on the Arcade Learning Environment (ALE) (Bellemare et al., 2013). Kaiser et al. (2020) introduced the Atari 100K benchmark |
| Dataset Splits | Yes | While Atari 100K training set consists of 26 games, we evaluate the performance of various components in BBF on 29 validation games in ALE that are not in Atari 100K. |
| Hardware Specification | Yes | IRIS uses half of an A100 GPU for a week per run. SR-SPR, at its highest replay ratio of 16, uses 25% of an A100 GPU and a single CPU for roughly 24 hours. Our BBF agent at replay ratio 8 takes only 10 hours with a single CPU and half of an A100 GPU. |
| Software Dependencies | No | The paper mentions software like 'Dopamine framework', 'Python', 'NumPy', 'Matplotlib', and 'JAX', but does not provide specific version numbers for any of these components. |
| Experiment Setup | Yes | For BBF, we use RR=8 in order to balance the increased computation arising from our large network. Our n-step schedule... decreases exponentially from 10 to 3 over the first 10K gradient steps following each network reset... reset every 40k gradient steps. We choose γ1 = 0.97, slightly lower than the typical discount used for Atari, and γ2 = 0.997. We incorporate weight decay... use the Adam W optimizer (Loshchilov & Hutter, 2019) with a weight decay value of 0.1. |