The Dormant Neuron Phenomenon in Deep Reinforcement Learning
Authors: Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro, Utku Evci
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that Re Do maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective method (Re Do) that Recycles Dormant neurons throughout training. Our experiments demonstrate that Re Do maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance. |
| Researcher Affiliation | Collaboration | 1Eindhoven University of Technology, The Netherlands 2Work done while the author was intern at Google Deep Mind 3Google Deep Mind 4Mila. Correspondence to: Ghada Sokar <g.a.z.n.sokar@tue.nl>, Rishabh Agarwal < rishabhagarwal@google.com>, Pablo Samuel Castro <psc@google.com>, Utku Evci <evci@google.com>. |
| Pseudocode | Yes | Algorithm 1 Re Do Input: Network parameters θ, threshold τ, training steps T, frequency F for t = 1 to to T do Update θ with regular RL loss if t mod F == 0 then for each neuron i do if sℓ i τ then Reinitialize input weights of neuron i Set outgoing weights of neuron i to 0 end if end for end if end for |
| Open Source Code | Yes | All our experiments and implementations were conducted using the Dopamine framework (Castro et al., 2018). Code is available at https://github.com/google/dopamine/tree/master/dopamine/labs/redo |
| Open Datasets | Yes | We evaluate DQN on 17 games from the Arcade Learning Environment (Bellemare et al., 2013): Asterix, Demon Attack, Seaquest, Wizard of Wor, Bream Reader, Road Runner, James Bond, Qbert, Breakout, Enduro, Space Invaders, Pong, Zaxxon, Yars Revenge, Ms. Pacman, Double Dunk, Ice Hockey. This set is used by previous works (Kumar et al., 2021a;b) to study the implicit under-parameterization phenomenon in offline RL. For hyper-parameter tuning, we used five games (Asterix, Demon Attack, Seaquest, Breakout, Beam Rider). We evaluate Dr Q(ϵ) on the 26 games of Atari 100K (Kaiser et al., 2019). We used the best hyper-parameters found for DQN in training Dr Q(ϵ). |
| Dataset Splits | No | The paper describes online data collection and interaction with environments for RL tasks, and uses replay buffers. While performance is evaluated, explicit train/validation/test dataset splits with percentages or sample counts for reproduction, as typically found in supervised learning, are not provided in the paper's main text for the RL experiments. For the CIFAR-10 experiment, it also does not explicitly mention train/validation/test splits. |
| Hardware Specification | No | The paper does not explicitly provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using the Dopamine framework, TF-Agents, NumPy, Matplotlib, JAX, and Adam optimizer. However, it does not provide specific version numbers for these software components, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | All our experiments and implementations were conducted using the Dopamine framework (Castro et al., 2018). ... The hyper-parameters are provided in Tables 1, 2, and 3. ... The hyper-parameters are provided in Table 4. ... For replay ratio, we evaluate replay ratio values: {0.25 (default), 0.5, 1, 2}. ... Re Do hyper-parameters. We did the hyper-parameter search for DQN trained with RR = 1 using the nature CNN architecture. We searched over the grids [1000, 10000, 100000] and [0, 0.01, 0.1] for the recycling period and τ-dormant, respectively. |