Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Transformer brain encoders explain human high-level visual responses

Authors: Hossein Adeli, Sun Minni, Nikolaus Kriegeskorte

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We run our experiments on the Natural Scene Dataset (NSD; [3]) where the f MRI (functional magnetic resonance imaging) responses were collected from 8 subjects, each seeing up to 10,000 images. The reported results are from subjects 1, 2, 5, and 7 who completed all recording sessions. ... All models were trained to predict the most visually responsive vertices in the brain 1. Our analyses focused on as subset of approximately 15k vertices for each left and right hemispheres (LH and RH), shown in Figure 2A on a surface map. ... We present results using multiple different feature backbones namely, DINOv2 base model [38], Res Net50 [20], and CLIP large model [41]. ... Table 1 shows the encoding accuracy of the encoding models using the DINOv2 backbone.
Researcher Affiliation	Academia	Hossein Adeli Zuckerman Mind Brain Behavior Institute Columbia University EMAIL; Sun Minni Zuckerman Mind Brain Behavior Institute Columbia University EMAIL; Nikolaus Kriegeskorte Zuckerman Mind Brain Behavior Institute Columbia University EMAIL
Pseudocode	No	The paper describes the model architecture (Figure 1) and its components verbally and visually, but does not present any structured pseudocode or algorithm blocks with labeled steps.
Open Source Code	Yes	Our code is available at https://github.com/Hosseinadeli/transformer_brain_encoder/.
Open Datasets	Yes	We run our experiments on the Natural Scene Dataset (NSD; [3]) where the f MRI (functional magnetic resonance imaging) responses were collected from 8 subjects, each seeing up to 10,000 images.
Dataset Splits	Yes	We use the train/test split that was introduced in the Algonauts benchmark [17] where the last three sessions for each subject were held out to ensure that no test data were accessed during the model development and to make the prediction task as natural as possible (predicting the future responses). ... For all models including all the baselines, we did 10-fold cross validation using the training set for each subject and averaged the model predictions across all folds.
Hardware Specification	Yes	We used GPUs (NVIDIA L40s), memory, and storage resources from an internal cluster. Storage for the entire project totals roughly 3TB. Training the model used roughly 4,000 GPU hours.
Software Dependencies	No	The paper mentions using specific models and tools such as DINOv2 [38], ResNet50 [20], CLIP [41], Adam optimizer [25], Pycortex [15], YOLOv5 [22], YOLOv8-face [11], and Deep Gaze [30]. However, it does not provide specific version numbers for underlying software frameworks or libraries like Python, PyTorch, or TensorFlow, which are essential for full reproducibility.
Experiment Setup	Yes	The ROI queries, transformer decoder layer and the linear mappings are trained with the Adam optimizer [25] using mean-squared-error loss between the prediction and the ground truth f MRI activity for each image. We train and test the models separately for each subject. ... For all models including all the baselines, we did 10-fold cross validation using the training set for each subject and averaged the model predictions across all folds. ... The Ridge regression model ... We used a grid search to select the best ridge penalty to maximize performance on the validation data.