Deep Reinforcement Learning with Plasticity Injection

Authors: Evgenii Nikishin, Junhyuk Oh, Georg Ostrovski, Clare Lyle, Razvan Pascanu, Will Dabney, Andre Barreto

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The applications of this intervention are two-fold: first, as a diagnostic tool if injection increases the performance, we may conclude that an agent s network was losing its plasticity. This tool allows us to identify a subset of Atari environments where the lack of plasticity causes performance plateaus, motivating future studies on understanding and combating plasticity loss. Second, plasticity injection can be used to improve the computational efficiency of RL training if the agent has to re-learn from scratch due to exhausted plasticity or by growing the agent s network dynamically without compromising performance. The results on Atari show that plasticity injection attains stronger performance compared to alternative methods while being computationally efficient.
Researcher Affiliation Collaboration Evgenii Nikishin Junhyuk Oh Georg Ostrovski Clare Lyle Razvan Pascanu Will Dabney André Barreto Deep Mind Work done during the internship; currently at Mila, Université de Montréal.
Pseudocode No The paper includes illustrations of the network architecture (Figure 2) and descriptions of the intervention, but it does not provide any formal pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes The baseline agent is Double DQN [Van Hasselt et al., 2016] learning for 200M interactions on a standard set of 57 Atari games from the Arcade Learning Environment benchmark [Bellemare et al., 2013].
Dataset Splits No The paper mentions training for '200M interactions' and evaluating performance, but it does not specify explicit training, validation, and test dataset splits for reproducibility.
Hardware Specification Yes At the same time, it saves about 20 hours of wallclock time on an A100 GPU since it uses a smaller network up to 50M frames and has fewer parameters that are updated after plasticity injection.
Software Dependencies No We acknowledge the Python community [Van Rossum and Drake Jr, 1995, Oliphant, 2007] for developing the core tools that enabled this work, including JAX [Bradbury et al., 2018, Babuschkin et al., 2020], Jupyter [Kluyver et al., 2016], Num Py [Oliphant, 2006, Van Der Walt et al., 2011], Sci Py [Jones et al., 2014], Matplotlib [Hunter, 2007], and pandas [Mc Kinney, 2012].
Experiment Setup Yes The majority of the experiments use a single plasticity injection after 50M frames; otherwise, we explicitly specify the number and timesteps of injections. Appendix B discusses ablations on the design choices when using plasticity injection. A convolutional neural network employed by the Double DQN agent consists of 5 layers. The encoder corresponds to the first three of them (hence k = 3), while the head refers to the last two. Since DQN-based agents employ a target copy of the network parameters, we perform the same interventions on them. For reliable evaluation of the performance across environments, we adopt the protocol of Agarwal et al. [2021] with a focus on the interquartile mean (IQM). All experiments use 3 random seeds.