Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Authors: Max Schwarzer, Johan Samir Obando Ceron, Aaron Courville, Marc G Bellemare, Rishabh Agarwal, Pablo Samuel Castro
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available. |
| Researcher Affiliation | Collaboration | 1Google Deep Mind 2Mila 3Universit e de Montr eal. Correspondence to: Max Schwarzer <Max EMAIL>, Johan Obando Ceron <EMAIL>. |
| Pseudocode | No | The paper describes the algorithms and components in prose, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We make our code and data publicly available. |
| Open Datasets | Yes | Mnih et al. (2015a) introduced the agent DQN by combining temporal-difference learning with deep networks, and demonstrated its capabilities in achieving human-level performance on the Arcade Learning Environment (ALE) (Bellemare et al., 2013). Kaiser et al. (2020) introduced the Atari 100K benchmark |
| Dataset Splits | Yes | While Atari 100K training set consists of 26 games, we evaluate the performance of various components in BBF on 29 validation games in ALE that are not in Atari 100K. |
| Hardware Specification | Yes | IRIS uses half of an A100 GPU for a week per run. SR-SPR, at its highest replay ratio of 16, uses 25% of an A100 GPU and a single CPU for roughly 24 hours. Our BBF agent at replay ratio 8 takes only 10 hours with a single CPU and half of an A100 GPU. |
| Software Dependencies | No | The paper mentions software like 'Dopamine framework', 'Python', 'NumPy', 'Matplotlib', and 'JAX', but does not provide specific version numbers for any of these components. |
| Experiment Setup | Yes | For BBF, we use RR=8 in order to balance the increased computation arising from our large network. Our n-step schedule... decreases exponentially from 10 to 3 over the first 10K gradient steps following each network reset... reset every 40k gradient steps. We choose γ1 = 0.97, slightly lower than the typical discount used for Atari, and γ2 = 0.997. We incorporate weight decay... use the Adam W optimizer (Loshchilov & Hutter, 2019) with a weight decay value of 0.1. |