Small batch deep reinforcement learning
Authors: Johan Obando Ceron, Marc Bellemare, Pablo Samuel Castro
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work we present a broad empirical study that suggests reducing the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon. |
| Researcher Affiliation | Collaboration | Johan Obando-Ceron Mila, Université de Montréal jobando0730@gmail.com Marc G. Bellemare Mila, Université de Montréal bellemam@mila.quebec Pablo Samuel Castro Google DeepMind Mila, Université de Montréal psc@google.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | Yes | Our experiments were built on open source code, mostly from the Dopamine repository. The root directory for these is https://github.com/google/dopamine/tree/master/dopamine/, and we specify the subdirectories below (with clickable links): DQN, Rainbow, QR-DQN and IQN agents from /jax/agents/ Atari-100k agents from /labs/atari-100k/ Batch size from /jax/agents/quantile/configs/quantile.gin (line 36) Exploration ϵ = 0 from /jax/agents/quantile/configs/quantile.gin (line 16) Resnet from /labs/offline-rl/jax/networks.py (line 108) Dormant neurons metric from /labs/redo/ |
| Open Datasets | Yes | We use the Jax implementations of RL agents, with their default hyperparameter values, provided by the Dopamine library [Castro et al., 2018]2 and applied to the Arcade Learning Environment (ALE) [Bellemare et al., 2013]. |
| Dataset Splits | No | The paper uses the Arcade Learning Environment where data is generated through interaction, not a fixed dataset with predefined training, validation, and test splits. While it describes evaluation protocols, it does not specify dataset splits in the conventional supervised learning sense. |
| Hardware Specification | Yes | All experiments were run on NVIDIA Tesla P100 GPUs. |
| Software Dependencies | No | The paper mentions using 'Jax implementations of RL agents' and the 'Dopamine library', but it does not provide specific version numbers for these or other software dependencies. It also mentions Num Py, Matplotlib, and JAX in acknowledgments without versions. |
| Experiment Setup | Yes | Experimental setup: We use the Jax implementations of RL agents, with their default hyperparameter values, provided by the Dopamine library [Castro et al., 2018]2 and applied to the Arcade Learning Environment (ALE) [Bellemare et al., 2013].3 It is worth noting that the default batch size is 32, which we indicate with a black color in all the plots below, for clarity. We evaluate our agents on 20 games chosen by Fedus et al. [2020] for their analysis of replay ratios, picked to offer a diversity of difficulty and dynamics. To reduce the computational burden, we ran most of our experiments for 100 million frames (as opposed to the standard 200 million). For evaluation, we follow the guidelines of Agarwal et al. [2021]. Specifically, we run 3 independent seeds for each experiment and report the human-normalized interquantile mean (IQM), aggregated over the 20 games, configurations, and seeds, with the 95% stratified bootstrap confidence intervals. |