Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning

Authors: Jiashun Liu, Zihao Wu, Johan Obando Ceron, Pablo Samuel Castro, Aaron C. Courville, Ling Pan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a series of experiments to investigate whether Re Gra Ma can mitigate neuronal activity loss and enhance performance. Specifically, we evaluate the effectiveness of Re Gra Ma across three representative and widely adopted architecture types: (i) the residual network-based policy (Sec. 5.1), (ii) the online policy parameterized by a diffusion model (Sec. 5.2), and (iii) the MLP policy featuring various activation functions (Sec. 5.3).
Researcher Affiliation Academia 1 Hong Kong University of Science and Technology 2 Mila Québec AI Institute 3 Université de Montréal
Pseudocode Yes Algorithm 1: Re Gra Ma Input :Model θ, threshold τ, frequency t while t < maximum training time do Update θ with regular RL loss; if t mod t == 0 then for each layer ℓdo for eachneuron i do Calculate Gℓ i Eq. 2 if Gℓ i τ then Reinitialize neuron i;
Open Source Code Yes We make our code available2. 2Code: https://github.com/torressliu/grad-based-plasticity-metrics
Open Datasets Yes We conduct extensive experiments on Mu Jo Co [Brockman et al., 2016], Deep Mind Control Suite [Tassa et al., 2018], showing that Gra Ma-guided resetting improves performance and learning stability across diverse architectures. We trained a traditional fully connected network with Re LU on the CIFAR100 benchmark [Krizhevsky, 2009].
Dataset Splits No The paper mentions using tasks from well-known environments like Mu Jo Co and Deep Mind Control Suite, and discusses a continuous learning setup with CIFAR100 where new data categories are added every 15 epochs. However, it does not provide explicit training/test/validation split percentages, sample counts, or specific predefined split information for any dataset.
Hardware Specification Yes Figure 5: (Left) Execution time comparison based on BRO-net (RTX3090 GPU);
Software Dependencies No The paper mentions using Python, NumPy, Matplotlib, Jupyter, Pandas, and Clean RL, and bases its implementation on official codebases for BRO-net and DACER. However, it does not specify version numbers for any of these software components or libraries, which are required for a reproducible description of ancillary software.
Experiment Setup Yes Appendix A provides "Experimental Details" which includes specific "Hyperparameter setting" sections for Residual network based policy (BRO-net), Diffusion model based policy (DACER), and MLP-based SAC. These sections contain tables (e.g., Table 3, Table 5, Table 7) listing detailed hyperparameter values such as learning rates, batch sizes, discount factors, reset Ď„, and reset frequencies.