Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

V1T: large-scale mouse V1 response prediction using a Vision Transformer

Authors: Bryan M. Li, Isabel Maria Cornacchia, Nathalie Rochefort, Arno Onken

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our model on two large datasets recorded from mouse primary visual cortex and outperform previous convolution-based models by more than 12.7% in prediction performance. Moreover, we show that the self-attention weights learned by the Transformer correlate with the population receptive fields. Our model thus sets a new benchmark for neural response prediction and can be used jointly with behavioral and neural recordings to reveal meaningful characteristic features of the visual cortex.
Researcher Affiliation	Academia	1School of Informatics, University of Edinburgh 2Centre for Discovery Brain Sciences, University of Edinburgh 3Simons Initiative for the Developing Brain, University of Edinburgh
Pseudocode	No	The paper describes the model architecture and mathematical formulations (e.g., equations 1, 2, 3) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code available at github.com/bryanlimy/V1T.
Open Datasets	Yes	We considered two large-scale neural datasets for this work, Dataset S1 by Willeke et al. (2022) and Dataset F by Franke et al. (2022). These two datasets consist of V1 recordings from behaving rodents in response to thousands of natural images, providing an excellent platform to evaluate our proposed method and compare it against previous visual predictive models.
Dataset Splits	Yes	Each recording session consists of up to 6,000 image presentations (i.e. trials), where 5,000 unique images are combined with 10 repetitions of 100 additional unique images, randomly intermixed. The 1,000 trials with repeated images are used as the test set and the rest are divided into train and validation sets with a split ratio of 90% and 10% respectively.
Hardware Specification	Yes	Each model was trained on a single Nvidia RTX 2080Ti GPU and all models converged within 200 epochs.
Software Dependencies	No	The paper mentions software tools like 'Adam W optimizer (Loshchilov and Hutter, 2019)' and 'SciPy's curve_fit() function' but does not provide specific version numbers for any key software components or libraries.
Experiment Setup	Yes	We used the same train, validation and test split provided by the two datasets (see Section 2). Natural images, recorded responses, and behavioral variables (i.e. pupil dilation, dilation derivative, pupil center, running speed) were standardized using the mean and standard deviation measured from the training set and the images were then resized to 36 x 64 pixels from 144 x 256 pixels. The shared core and per-animal readout modules were trained jointly using the Adam W optimizer (Loshchilov and Hutter, 2019) to minimize the Poisson loss... A small value ε = 1e-8 was added to both r and o prior to the loss calculation to improve numeric stability. Gradients from each mouse were accumulated before a single gradient update to all modules... We used a learning rate scheduler in conjunction with early stopping: if the validation loss did not improve over 10 consecutive epochs, we reduced the learning rate by a factor of 0.3; if the model still had not improved after 2 learning rate reductions, we then terminated the training process. Dropout (Srivastava et al., 2014), stochastic depths (Huang et al., 2016), and L1 weight regularization were added to prevent overfitting. The weights in dense layers were initialized by sampling from a truncated normal distribution (µ = 0.0, σ = 0.02), where the bias values were set to 0.0; whereas the weight and bias in Layer Norm were set to 1.0 and 0.0. Each model was trained on a single Nvidia RTX 2080Ti GPU and all models converged within 200 epochs. Finally, we employed Hyperband Bayesian optimization (Li et al., 2017) to find the hyperparameters that achieved the best performance in the validation set. This included finding the optimal tokenization method and self-attention mechanism. The initial search space and final hyperparameter settings are detailed in Table A.2.