Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning

Authors: Jiashun Liu, Zihao Wu, Johan Obando Ceron, Pablo Samuel Castro, Aaron C. Courville, Ling Pan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a series of experiments to investigate whether Re Gra Ma can mitigate neuronal activity loss and enhance performance. Specifically, we evaluate the effectiveness of Re Gra Ma across three representative and widely adopted architecture types: (i) the residual network-based policy (Sec. 5.1), (ii) the online policy parameterized by a diffusion model (Sec. 5.2), and (iii) the MLP policy featuring various activation functions (Sec. 5.3).
Researcher Affiliation	Academia	1 Hong Kong University of Science and Technology 2 Mila Québec AI Institute 3 Université de Montréal
Pseudocode	Yes	Algorithm 1: Re Gra Ma Input :Model θ, threshold τ, frequency t while t < maximum training time do Update θ with regular RL loss; if t mod t == 0 then for each layer ℓdo for eachneuron i do Calculate Gℓ i Eq. 2 if Gℓ i τ then Reinitialize neuron i;
Open Source Code	Yes	We make our code available2. 2Code: https://github.com/torressliu/grad-based-plasticity-metrics
Open Datasets	Yes	We conduct extensive experiments on Mu Jo Co [Brockman et al., 2016], Deep Mind Control Suite [Tassa et al., 2018], showing that Gra Ma-guided resetting improves performance and learning stability across diverse architectures. We trained a traditional fully connected network with Re LU on the CIFAR100 benchmark [Krizhevsky, 2009].
Dataset Splits	No	The paper mentions using tasks from well-known environments like Mu Jo Co and Deep Mind Control Suite, and discusses a continuous learning setup with CIFAR100 where new data categories are added every 15 epochs. However, it does not provide explicit training/test/validation split percentages, sample counts, or specific predefined split information for any dataset.
Hardware Specification	Yes	Figure 5: (Left) Execution time comparison based on BRO-net (RTX3090 GPU);
Software Dependencies	No	The paper mentions using Python, NumPy, Matplotlib, Jupyter, Pandas, and Clean RL, and bases its implementation on official codebases for BRO-net and DACER. However, it does not specify version numbers for any of these software components or libraries, which are required for a reproducible description of ancillary software.
Experiment Setup	Yes	Appendix A provides "Experimental Details" which includes specific "Hyperparameter setting" sections for Residual network based policy (BRO-net), Diffusion model based policy (DACER), and MLP-based SAC. These sections contain tables (e.g., Table 3, Table 5, Table 7) listing detailed hyperparameter values such as learning rates, batch sizes, discount factors, reset τ, and reset frequencies.