reproducibilityindex.ai

Small batch deep reinforcement learning

Authors: Johan Obando Ceron, Marc Bellemare, Pablo Samuel Castro

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work we present a broad empirical study that suggests reducing the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon.
Researcher Affiliation	Collaboration	Johan Obando-Ceron Mila, Université de Montréal jobando0730@gmail.com Marc G. Bellemare Mila, Université de Montréal bellemam@mila.quebec Pablo Samuel Castro Google DeepMind Mila, Université de Montréal psc@google.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code	Yes	Our experiments were built on open source code, mostly from the Dopamine repository. The root directory for these is https://github.com/google/dopamine/tree/master/dopamine/, and we specify the subdirectories below (with clickable links): DQN, Rainbow, QR-DQN and IQN agents from /jax/agents/ Atari-100k agents from /labs/atari-100k/ Batch size from /jax/agents/quantile/conﬁgs/quantile.gin (line 36) Exploration ϵ = 0 from /jax/agents/quantile/conﬁgs/quantile.gin (line 16) Resnet from /labs/ofﬂine-rl/jax/networks.py (line 108) Dormant neurons metric from /labs/redo/
Open Datasets	Yes	We use the Jax implementations of RL agents, with their default hyperparameter values, provided by the Dopamine library [Castro et al., 2018]2 and applied to the Arcade Learning Environment (ALE) [Bellemare et al., 2013].
Dataset Splits	No	The paper uses the Arcade Learning Environment where data is generated through interaction, not a fixed dataset with predefined training, validation, and test splits. While it describes evaluation protocols, it does not specify dataset splits in the conventional supervised learning sense.
Hardware Specification	Yes	All experiments were run on NVIDIA Tesla P100 GPUs.
Software Dependencies	No	The paper mentions using 'Jax implementations of RL agents' and the 'Dopamine library', but it does not provide specific version numbers for these or other software dependencies. It also mentions Num Py, Matplotlib, and JAX in acknowledgments without versions.
Experiment Setup	Yes	Experimental setup: We use the Jax implementations of RL agents, with their default hyperparameter values, provided by the Dopamine library [Castro et al., 2018]2 and applied to the Arcade Learning Environment (ALE) [Bellemare et al., 2013].3 It is worth noting that the default batch size is 32, which we indicate with a black color in all the plots below, for clarity. We evaluate our agents on 20 games chosen by Fedus et al. [2020] for their analysis of replay ratios, picked to offer a diversity of difficulty and dynamics. To reduce the computational burden, we ran most of our experiments for 100 million frames (as opposed to the standard 200 million). For evaluation, we follow the guidelines of Agarwal et al. [2021]. Specifically, we run 3 independent seeds for each experiment and report the human-normalized interquantile mean (IQM), aggregated over the 20 games, configurations, and seeds, with the 95% stratified bootstrap confidence intervals.