Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Computing a human-like reaction time metric from stable recurrent vision models

Authors: Lore Goetschalckx, Lakshmi Narasimhan Govindarajan, Alekh Karkada Ashok, Aarit Ahuja, David Sheinberg, Thomas Serre

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that our metric aligns with patterns of human reaction times for stimulus manipulations across four disparate visual decision-making tasks spanning perceptual grouping, mental simulation, and scene categorization. This work paves the way for exploring the temporal alignment of model and human visual strategies in the context of various other cognitive tasks toward generating testable hypotheses for neuroscience.
Researcher Affiliation	Academia	Lore Goetschalckx 1,2, Lakshmi N. Govindarajan 3, Alekh K. Ashok1,2, Aarit Ahuja4, David L. Sheinberg4, and Thomas Serre1,2 1Department of Cognitive, Linguistic, & Psychological Sciences, Brown University, Providence, RI 02912 2Carney Institute for Brain Science, Brown University, Providence, RI 02912 EMAIL 3Integrative Computational Neuroscience (ICo N) Center, MIT, Cambridge, MA 02319 EMAIL 4Neuroscience Department, Brown University, Providence, RI 02912 EMAIL
Pseudocode	No	The paper provides mathematical formulations and equations for its model implementation and training in Supplementary Information A.1 and A.2, but it does not include a distinct block, figure, or section explicitly labeled as "Pseudocode" or "Algorithm."
Open Source Code	Yes	Links to the code and data can be found on the project page: https://serre-lab.github.io/rnn_rts_site/.
Open Datasets	Yes	We created 340K (15K) training (validation) outline stimuli from MS COCO images [51]. The Planko task (and corresponding dataset) was introduced in [53]... To build our training (N = 34.5K) and validation (N = 14K) set, we started from mazes provided by [40]... We selected images from the SUN database [58] querying the same classes as in [57]...
Dataset Splits	Yes	We created 340K (15K) training (validation) outline stimuli from MS COCO images [51]. To build our training (N = 34.5K) and validation (N = 14K) set... We selected images from the SUN database [58] querying the same classes as in [57], but excluding the images shown to human participants, and performed a stratified train (N = 8.8K) and validation (N = 979) split.
Hardware Specification	Yes	The c RNNs were trained in a data-parallel manner on 4 Nvidia RTX GPUs (Titan/3090/A5000) with 24GB of memory each.
Software Dependencies	No	The paper mentions using PyTorch and torchvision and references specific optimizers (Adam), but it does not provide specific version numbers for these software libraries or other dependencies, which are necessary for full reproducibility of the software environment.
Experiment Setup	Yes	The c RNNs were trained in a data-parallel manner on 4 Nvidia RTX GPUs (Titan/3090/A5000) with 24GB of memory each. They were furthermore trained with Adam [50], a learning rate of 1e 3, and γ = 100 for the C-RBP penalty ([8]; also see SI A). We use T = 40 recurrent timesteps during the training phase, with an exception for the task in Section 6, where T = 80 and γ = 1000. We choose τ = 16 for incremental grouping, τ = 10 for Planko, τ = 20 for the maze task, and τ = 16 for scene categorization. All models listed below were trained from scratch with a batch size of 128 and a standard cross entropy loss.