MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
Authors: Paul Steven Scotti, Mihir Tripathy, Cesar Torrico, Reese Kneeland, Tong Chen, Ashutosh Narang, Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris, Kenneth A. Norman, Tanishq Mathew Abraham
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The present work showcases high-quality reconstructions using only 1 hour of f MRI training data. We pretrain our model across 7 subjects and then fine-tune on minimal data from a new subject. ... We evaluate the fidelity of our SDXL un CLIP model to reconstruct images from ground truth Open CLIP Vi T-big G/14 image embeddings in Appendix A.6, showing that reconstructions are nearly identical to the original images. ... We conducted behavioral experiments with online human raters to confirm that people subjectively prefer the refined reconstructions compared to the unrefined reconstructions... |
| Researcher Affiliation | Collaboration | 1Stability AI 2Medical AI Research Center (Med ARC) 3Princeton Neuroscience Institute 4University of Minnesota 5The University of Sydney 6University of Waterloo. Correspondence to: Paul Scotti <scottibrain@gmail.com>. |
| Pseudocode | Yes | Algorithm 1 Py Torch code to convert Open CLIP big G to CLIP L. |
| Open Source Code | Yes | All code is available on Git Hub. |
| Open Datasets | Yes | We used the Natural Scenes Dataset (NSD) (Allen et al., 2022), a public f MRI dataset containing the brain responses of human participants viewing rich naturalistic stimuli from COCO (Lin et al., 2014). |
| Dataset Splits | No | The paper describes the use of training and test sets and refers to a 'standardized approach to train/test splits used by other NSD reconstruction papers', but it does not explicitly specify a separate validation dataset split with details such as size or percentage. |
| Hardware Specification | Yes | Single-subject models were trained/fine-tuned on a single 8x A100 80Gb GPU node for 150 epochs with a batch size of 24. Multi-subject pretraining was done with a batch size of 63 (9 samples per each of 7 subjects). Models were trained with Huggingface Accelerate (Gugger et al., 2022) and Deep Speed (Rajbhandari et al., 2020) Stage 2 with CPU offloading. ... We fine-tuned SDXL on one 8x A100 80GB GPU node using an internal dataset for 110,000 optimization steps at a resolution of 256x256 pixels and a batch size of 8 with offset-noise (Lin et al., 2024; Guttenberg, 2023) set to 0.04. |
| Software Dependencies | No | Models were trained with Huggingface Accelerate (Gugger et al., 2022) and Deep Speed (Rajbhandari et al., 2020) Stage 2 with CPU offloading. The paper mentions software tools like Huggingface Accelerate, Deep Speed, and PyTorch (implied by code snippet), but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Single-subject models were trained/fine-tuned on a single 8x A100 80Gb GPU node for 150 epochs with a batch size of 24. Multi-subject pretraining was done with a batch size of 63 (9 samples per each of 7 subjects). ... We fine-tuned SDXL on one 8x A100 80GB GPU node using an internal dataset for 110,000 optimization steps at a resolution of 256x256 pixels and a batch size of 8 with offset-noise (Lin et al., 2024; Guttenberg, 2023) set to 0.04. All other settings were identical to those used with base Stable Diffusion XL. |