Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Modelling the control of offline processing with reinforcement learning

Authors: Eleanor Spens, Neil Burgess, Tim Behrens

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using image classification, maze solving, and relational inference tasks, we show that the meta-controller learns an adaptive curriculum for offline learning. This lays the groundwork for normative predictions about replay in a range of experimental neuroscience tasks.
Researcher Affiliation	Academia	1University of Oxford 2University College London
Pseudocode	No	The paper describes procedural steps in paragraph text and figures, but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Code for all simulations can be found at https://github.com/ellie-as/rl-with-metacognitive-actions.
Open Datasets	Yes	Fashion-MNIST (a more challenging variant of MNIST featuring images of ten items of clothing) is used as a toy dataset [31].
Dataset Splits	No	The paper describes dynamic sampling and uses a small validation set for data valuation, but does not provide explicit, fixed training, testing, and validation splits for the main learning tasks or overall experimental reproduction in the traditional sense. For example, 'Note that we assume a small validation set exists for the data valuation element. To describe this more concretely, if the hippocampus contained many images, the marginal contribution of each of a subset (here, 25%) of images to the classification accuracy on a small validation set would be obtained.'
Hardware Specification	Yes	Simulations were run on Linux virtual machines with NVIDIA A100 GPUs, and on Mac OS with the MPS backend for GPU support.
Software Dependencies	No	Our experiments used Stable-Baselines3 [28] for the RL algorithms, Gymnasium [29] to create the custom environments, and Py Torch [30] for other neural network training.
Experiment Setup	Yes	At the start of each episode 200 new images are stored in a buffer representing the hippocampus. The reward in the wake state, after three offline steps in the asleep state, is the accuracy of a classifier. Fashion-MNIST (a more challenging variant of MNIST featuring images of ten items of clothing) is used as a toy dataset [31].