Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Modelling the control of offline processing with reinforcement learning

Authors: Eleanor Spens, Neil Burgess, Tim Behrens

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using image classification, maze solving, and relational inference tasks, we show that the meta-controller learns an adaptive curriculum for offline learning. This lays the groundwork for normative predictions about replay in a range of experimental neuroscience tasks.
Researcher Affiliation Academia 1University of Oxford 2University College London
Pseudocode No The paper describes procedural steps in paragraph text and figures, but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Code for all simulations can be found at https://github.com/ellie-as/rl-with-metacognitive-actions.
Open Datasets Yes Fashion-MNIST (a more challenging variant of MNIST featuring images of ten items of clothing) is used as a toy dataset [31].
Dataset Splits No The paper describes dynamic sampling and uses a small validation set for data valuation, but does not provide explicit, fixed training, testing, and validation splits for the main learning tasks or overall experimental reproduction in the traditional sense. For example, 'Note that we assume a small validation set exists for the data valuation element. To describe this more concretely, if the hippocampus contained many images, the marginal contribution of each of a subset (here, 25%) of images to the classification accuracy on a small validation set would be obtained.'
Hardware Specification Yes Simulations were run on Linux virtual machines with NVIDIA A100 GPUs, and on Mac OS with the MPS backend for GPU support.
Software Dependencies No Our experiments used Stable-Baselines3 [28] for the RL algorithms, Gymnasium [29] to create the custom environments, and Py Torch [30] for other neural network training.
Experiment Setup Yes At the start of each episode 200 new images are stored in a buffer representing the hippocampus. The reward in the wake state, after three offline steps in the asleep state, is the accuracy of a classifier. Fashion-MNIST (a more challenging variant of MNIST featuring images of ten items of clothing) is used as a toy dataset [31].