Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

In Silico Mapping of Visual Categorical Selectivity Across the Whole Brain

Authors: Ethan Hwang, Hossein Adeli, Wenxuan Guo, Andrew Luo, Nikolaus Kriegeskorte

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose an in silico approach for data-driven discovery of novel category-selectivity hypotheses based on an encoder decoder transformer model. The architecture incorporates a brain-region to image-feature cross-attention mechanism, enabling nonlinear mappings between high-dimensional deep network features and semantic patterns encoded in the brain activity. We further introduce a method to characterize the selectivity of individual parcels by leveraging diffusion-based image generative models and large-scale datasets to synthesize and select images that maximally activate each parcel. Our approach reveals regions with complex, compositional selectivity involving diverse semantic concepts, which we validate in silico both within and across subjects. Using a brain encoder as a digital twin offers a powerful, data-driven framework for generating and testing hypotheses about visual selectivity in the human brain hypotheses that can guide future f MRI experiments.
Researcher Affiliation	Academia	Ethan Hwang Zuckerman Mind Brain Behavior Institute Columbia University EMAIL Hossein Adeli Zuckerman Mind Brain Behavior Institute Columbia University EMAIL Wenxuan Guo Zuckerman Mind Brain Behavior Institute Columbia University EMAIL Andrew Luo University of Hong Kong EMAIL Nikolaus Kriegeskorte Zuckerman Mind Brain Behavior Institute Columbia University EMAIL
Pseudocode	No	The paper describes the architecture and methods in detail in Section 3 and subsections, but it does not include a specific pseudocode block or algorithm section labeled as such.
Open Source Code	Yes	Our code is available at: https://kriegeskorte-lab.github.io/in-silico-mapping/.
Open Datasets	Yes	We used the NSD [4], the largest f MRI dataset to date, with 7T f MRI responses from 8 subjects who each viewed up to 10,000 distinct natural scenes. ... Diffusion-generated superstimuli: Brain DIVE [46] uses a generative backbone guided by gradients from a brain encoder... Encoder-selected Image Net superstimuli: Image Net [17] images that maximally activate the parcel, according to the encoder.
Dataset Splits	Yes	We conducted two tests against the NSD test set the only held-out f MRI data not used during training. ... For Test 1, we asked whether semantic similarity to a label predicted the activation rank order of NSD test images better than chance. Our pipeline represents a label as the mean CLIP embedding of the top 32 Image Net or Brain DIVE images (both shown separately); the baseline uses the top 32 NSD-training images. ... Retrieval set: Because each subject s NSD training set contains distinct images, the ranking retrieval is performed on that subject s own NSD training set.
Hardware Specification	Yes	We used CPU (AMD EPYC 7662), GPU (NVIDIA A40, L40), memory, and storage resources from an internal cluster. Storage for the entire project totals roughly 10TB. Training the model used roughly 3,000 GPU hours, 24,000 CPU core hours, and 32 GB per GPU hour. Running the remaining experiments used roughly 20,000 GPU hours, 160,000 CPU core hours, and 32 GB per GPU hour.
Software Dependencies	No	The paper mentions several tools and models like Pycortex [25], Adam [39] (optimizer), DINOv2, CLIP, and GPT-4o [52] (for captioning), but it does not specify version numbers for general software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries critical for the core methodology.
Experiment Setup	Yes	Parcel queries, the transformer decoder, and linear mappings are optimized using Adam [39] to minimize the mean squared error between the predicted and actual f MRI responses. All other layers, including the backbone, are frozen. Separate models are trained for each subject. To improve prediction accuracy, we ensemble multiple instances of the brain encoder. For each subject, we trained two random seeds with features from four different DINOv2 backbone layers (the 0th, 2nd, 4th, and 6th layers from the last). To predict a vertex, we take the weighted average across model predictions, scaled by softmax weights from validation set accuracy for each model on that vertex. ... The resulting beta estimates were centered to zero mean and scaled to unit variance before training and experiments.