Understanding Plasticity in Neural Networks
Authors: Clare Lyle, Zeyu Zheng, Evgenii Nikishin, Bernardo Avila Pires, Razvan Pascanu, Will Dabney
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper conducts a systematic empirical analysis into plasticity loss, with the goal of understanding the phenomenon mechanistically in order to guide the future development of targeted solutions. We find that loss of plasticity is deeply connected to changes in the curvature of the loss landscape, but that it often occurs in the absence of saturated units. Based on this insight, we identify a number of parameterization and optimization design choices which enable networks to better preserve plasticity over the course of training. We validate the utility of these findings on larger-scale RL benchmarks in the Arcade Learning Environment. |
| Researcher Affiliation | Industry | 1Google DeepMind. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. Methods are described in natural language within the text. |
| Open Source Code | No | The paper references an external library used as a base: "standard implementation of double DQN (Van Hasselt et al., 2016) provided by Quan & Ostrovski (2020)", with a URL to `http://github.com/deepmind/dqn_zoo`. However, it does not state that the authors themselves are releasing the specific modified code or the code for their plasticity analysis and interventions described in the paper. |
| Open Datasets | Yes | We construct a simple MDP analogue of image classification, i.e. the underlying transition dynamics are defined over a set of ten states and ten actions, and the reward and transition dynamics depend on whether or not the action taken by the agent is equal to the index of its corresponding state. We construct three variants of a block MDP whose state space is the discrete set {0, . . . , 9} and whose observation space is given by either the CIFAR-10 or MNIST dataset. |
| Dataset Splits | No | The paper describes how probe tasks are used to measure plasticity by training the network on new regression problems, and the use of replay buffers in RL. However, it does not specify a conventional training/validation/test split for the main RL task or the datasets (MNIST/CIFAR-10) used within the MDPs in the way typically required for supervised learning reproducibility. |
| Hardware Specification | No | The paper does not explicitly state any specific hardware details such as GPU models, CPU types, or cloud computing resources used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like "Adam optimizer (Kingma & Ba, 2015)" and "RMSProp optimizer" and refers to "standard implementation of double DQN... provided by Quan & Ostrovski (2020)", but it does not specify version numbers for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, etc.) required for reproducibility. |
| Experiment Setup | Yes | Optimizer instability: ...Adam optimizer with learning rate equal to 0.001, first-order moment decay b1 = 0.9, second-order moment decay b2 = 0.999, ε = 10 9, and ε = 0. With the tuned optimizer, we se b2 = 0.9 and ε = 10 3. ...Double DQN: We use the RMSProp optimizer, ϵ-greedy exploration, and frame stacking (Mnih et al., 2015). Full implementation details can be found in Appendix A.3. ...training for 200 million frames and performing optimizer updates once every 4 environment steps... We use a replay buffer of size 100,000, and follow an ϵ-greedy policy during training with ϵ = 0.1. |