reproducibilityindex.ai

Bigger, Better, Faster: Human-level Atari with human-level efficiency

Authors: Max Schwarzer, Johan Samir Obando Ceron, Aaron Courville, Marc G Bellemare, Rishabh Agarwal, Pablo Samuel Castro

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efﬁcient RL research on the ALE. We make our code and data publicly available.
Researcher Affiliation	Collaboration	1Google Deep Mind 2Mila 3Universit e de Montr eal. Correspondence to: Max Schwarzer <Max A.Schwarzer@gmail.com>, Johan Obando Ceron <jobando0730@gmail.com>.
Pseudocode	No	The paper describes the algorithms and components in prose, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We make our code and data publicly available.
Open Datasets	Yes	Mnih et al. (2015a) introduced the agent DQN by combining temporal-difference learning with deep networks, and demonstrated its capabilities in achieving human-level performance on the Arcade Learning Environment (ALE) (Bellemare et al., 2013). Kaiser et al. (2020) introduced the Atari 100K benchmark
Dataset Splits	Yes	While Atari 100K training set consists of 26 games, we evaluate the performance of various components in BBF on 29 validation games in ALE that are not in Atari 100K.
Hardware Specification	Yes	IRIS uses half of an A100 GPU for a week per run. SR-SPR, at its highest replay ratio of 16, uses 25% of an A100 GPU and a single CPU for roughly 24 hours. Our BBF agent at replay ratio 8 takes only 10 hours with a single CPU and half of an A100 GPU.
Software Dependencies	No	The paper mentions software like 'Dopamine framework', 'Python', 'NumPy', 'Matplotlib', and 'JAX', but does not provide specific version numbers for any of these components.
Experiment Setup	Yes	For BBF, we use RR=8 in order to balance the increased computation arising from our large network. Our n-step schedule... decreases exponentially from 10 to 3 over the ﬁrst 10K gradient steps following each network reset... reset every 40k gradient steps. We choose γ1 = 0.97, slightly lower than the typical discount used for Atari, and γ2 = 0.997. We incorporate weight decay... use the Adam W optimizer (Loshchilov & Hutter, 2019) with a weight decay value of 0.1.