Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Plasticity as the Mirror of Empowerment

Authors: David Abel, Michael Bowling, Andre Barreto, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder P. Singh

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Lastly, we include a brief experiment to ground our theory in Appendix E, though suggest that deeper empirical work, and an efficient GDI estimator, are valuable directions for further research. ... In the first experiment, we use an Μ‘-greedy policy for exploration and study the impact of varying Μ‘ between [0, 1] on plasticity. We estimate the plasticity of Q-learning for each of these values of Μ‘ in the time intervals [1 : 3] and [2 : 5], so I(O1:3 A2:5). Results are shown in Figure 4(a). We see that, when Μ‘ = 0 the agent is relying entirely on the greedy policy to drive its actions, which is determined by the past observations.
Researcher Affiliation Collaboration David Abel Google Deep Mind Michael Bowling Amii, University of Alberta AndrΓ© Barreto Google Deep Mind Will Dabney Google Deep Mind Shi Dong Google Deep Mind Steven Hansen Google Deep Mind Anna Harutyunyan Google Deep Mind Khimya Khetarpal Google Deep Mind Clare Lyle Google Deep Mind Razvan Pascanu Google Deep Mind, Mila Georgios Piliouras Google Deep Mind Doina Precup Google Deep Mind Jonathan Richens Google Deep Mind Mark Rowland Google Deep Mind Tom Schaul Google Deep Mind Satinder Singh Google Deep Mind
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. It describes theoretical concepts and then mentions using 'tabular Q-learning' for experiments, but does not provide its pseudocode.
Open Source Code No Answer: [No] Justification: The experiment in Appendix E is very simple, and we believe can be reimplemented as needed.
Open Datasets No The experiments make use of a 'two-armed Bernoulli bandit'. This is a generative process or environment setup, not a pre-existing dataset for which concrete access information (link, DOI, repository, or formal citation) is typically provided or required. The paper does not provide such information for a dataset.
Dataset Splits No The experiments are conducted on a 'two-armed Bernoulli bandit', which is a problem setup or generative environment rather than a fixed dataset. Therefore, the concept of training, testing, or validation dataset splits, or their specific percentages/counts, does not directly apply or is not mentioned in the context of data partitioning.
Hardware Specification Yes Answer: [Yes] Justification: We used a single CPU to run the experiments, as they are small in scale.
Software Dependencies No The paper mentions 'tabular Q-learning' and an 'Μ‘-greedy policy' as methodologies, but it does not specify any particular software libraries, packages, or solvers with their corresponding version numbers that would be necessary to replicate the experiments.
Experiment Setup Yes In the first experiment, we use an Μ‘-greedy policy for exploration and study the impact of varying Μ‘ between [0, 1] on plasticity... in the time intervals [1 : 3] and [2 : 5], so I(O1:3 A2:5). ... In the second experiment, we study the impact of optimism and pessimism on plasticity and empowerment, again in the two-armed Bernoulli bandit. To vary the degree of optimism or pessimism present in the agent, we vary the initial Q value used for each action from 1 to 1 and examine the impact on plasticity and empowerment along the same intervals [1 : 3] and [2 : 5].