MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data

Authors: Paul Steven Scotti, Mihir Tripathy, Cesar Torrico, Reese Kneeland, Tong Chen, Ashutosh Narang, Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris, Kenneth A. Norman, Tanishq Mathew Abraham

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The present work showcases high-quality reconstructions using only 1 hour of f MRI training data. We pretrain our model across 7 subjects and then fine-tune on minimal data from a new subject. ... We evaluate the fidelity of our SDXL un CLIP model to reconstruct images from ground truth Open CLIP Vi T-big G/14 image embeddings in Appendix A.6, showing that reconstructions are nearly identical to the original images. ... We conducted behavioral experiments with online human raters to confirm that people subjectively prefer the refined reconstructions compared to the unrefined reconstructions...
Researcher Affiliation Collaboration 1Stability AI 2Medical AI Research Center (Med ARC) 3Princeton Neuroscience Institute 4University of Minnesota 5The University of Sydney 6University of Waterloo. Correspondence to: Paul Scotti <scottibrain@gmail.com>.
Pseudocode Yes Algorithm 1 Py Torch code to convert Open CLIP big G to CLIP L.
Open Source Code Yes All code is available on Git Hub.
Open Datasets Yes We used the Natural Scenes Dataset (NSD) (Allen et al., 2022), a public f MRI dataset containing the brain responses of human participants viewing rich naturalistic stimuli from COCO (Lin et al., 2014).
Dataset Splits No The paper describes the use of training and test sets and refers to a 'standardized approach to train/test splits used by other NSD reconstruction papers', but it does not explicitly specify a separate validation dataset split with details such as size or percentage.
Hardware Specification Yes Single-subject models were trained/fine-tuned on a single 8x A100 80Gb GPU node for 150 epochs with a batch size of 24. Multi-subject pretraining was done with a batch size of 63 (9 samples per each of 7 subjects). Models were trained with Huggingface Accelerate (Gugger et al., 2022) and Deep Speed (Rajbhandari et al., 2020) Stage 2 with CPU offloading. ... We fine-tuned SDXL on one 8x A100 80GB GPU node using an internal dataset for 110,000 optimization steps at a resolution of 256x256 pixels and a batch size of 8 with offset-noise (Lin et al., 2024; Guttenberg, 2023) set to 0.04.
Software Dependencies No Models were trained with Huggingface Accelerate (Gugger et al., 2022) and Deep Speed (Rajbhandari et al., 2020) Stage 2 with CPU offloading. The paper mentions software tools like Huggingface Accelerate, Deep Speed, and PyTorch (implied by code snippet), but does not provide specific version numbers for these dependencies.
Experiment Setup Yes Single-subject models were trained/fine-tuned on a single 8x A100 80Gb GPU node for 150 epochs with a batch size of 24. Multi-subject pretraining was done with a batch size of 63 (9 samples per each of 7 subjects). ... We fine-tuned SDXL on one 8x A100 80GB GPU node using an internal dataset for 110,000 optimization steps at a resolution of 256x256 pixels and a batch size of 8 with offset-noise (Lin et al., 2024; Guttenberg, 2023) set to 0.04. All other settings were identical to those used with base Stable Diffusion XL.