The Dormant Neuron Phenomenon in Deep Reinforcement Learning

Authors: Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro, Utku Evci

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that Re Do maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective method (Re Do) that Recycles Dormant neurons throughout training. Our experiments demonstrate that Re Do maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance.
Researcher Affiliation Collaboration 1Eindhoven University of Technology, The Netherlands 2Work done while the author was intern at Google Deep Mind 3Google Deep Mind 4Mila. Correspondence to: Ghada Sokar <g.a.z.n.sokar@tue.nl>, Rishabh Agarwal < rishabhagarwal@google.com>, Pablo Samuel Castro <psc@google.com>, Utku Evci <evci@google.com>.
Pseudocode Yes Algorithm 1 Re Do Input: Network parameters θ, threshold τ, training steps T, frequency F for t = 1 to to T do Update θ with regular RL loss if t mod F == 0 then for each neuron i do if sℓ i τ then Reinitialize input weights of neuron i Set outgoing weights of neuron i to 0 end if end for end if end for
Open Source Code Yes All our experiments and implementations were conducted using the Dopamine framework (Castro et al., 2018). Code is available at https://github.com/google/dopamine/tree/master/dopamine/labs/redo
Open Datasets Yes We evaluate DQN on 17 games from the Arcade Learning Environment (Bellemare et al., 2013): Asterix, Demon Attack, Seaquest, Wizard of Wor, Bream Reader, Road Runner, James Bond, Qbert, Breakout, Enduro, Space Invaders, Pong, Zaxxon, Yars Revenge, Ms. Pacman, Double Dunk, Ice Hockey. This set is used by previous works (Kumar et al., 2021a;b) to study the implicit under-parameterization phenomenon in offline RL. For hyper-parameter tuning, we used five games (Asterix, Demon Attack, Seaquest, Breakout, Beam Rider). We evaluate Dr Q(ϵ) on the 26 games of Atari 100K (Kaiser et al., 2019). We used the best hyper-parameters found for DQN in training Dr Q(ϵ).
Dataset Splits No The paper describes online data collection and interaction with environments for RL tasks, and uses replay buffers. While performance is evaluated, explicit train/validation/test dataset splits with percentages or sample counts for reproduction, as typically found in supervised learning, are not provided in the paper's main text for the RL experiments. For the CIFAR-10 experiment, it also does not explicitly mention train/validation/test splits.
Hardware Specification No The paper does not explicitly provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using the Dopamine framework, TF-Agents, NumPy, Matplotlib, JAX, and Adam optimizer. However, it does not provide specific version numbers for these software components, which is required for a reproducible description of ancillary software.
Experiment Setup Yes All our experiments and implementations were conducted using the Dopamine framework (Castro et al., 2018). ... The hyper-parameters are provided in Tables 1, 2, and 3. ... The hyper-parameters are provided in Table 4. ... For replay ratio, we evaluate replay ratio values: {0.25 (default), 0.5, 1, 2}. ... Re Do hyper-parameters. We did the hyper-parameter search for DQN trained with RR = 1 using the nature CNN architecture. We searched over the grids [1000, 10000, 100000] and [0, 0.01, 0.1] for the recycling period and τ-dormant, respectively.