Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors

Authors: Paul Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, ethan cohen, Aidan Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth Norman, Tanishq Abraham

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We comprehensively compare our approach with other existing methods, using both qualitative side-by-side comparisons and quantitative evaluations, and show that Mind Eye achieves state-of-the-art performance in both reconstruction and retrieval tasks. In particular, Mind Eye can retrieve the exact original image even among highly similar candidates, indicating that its brain embeddings retain fine-grained image-specific information. This allows us to accurately retrieve images even from large-scale databases like LAION-5B. We demonstrate through ablations that Mind Eye s performance improvements over previous methods result from specialized submodules for retrieval and reconstruction, improved training techniques, and training models with orders of magnitude more parameters.
Researcher Affiliation Collaboration Paul S. Scotti*,1,2, Atmadeep Banerjee*,2, Jimmie Goode ,2, Stepan Shabalin2, Alex Nguyen1, Ethan Cohen3, Aidan J. Dempster4, Nathalie Verlinde1, Elad Yundler5, David Weisberg1,2, Kenneth A. Norman ,1, and Tanishq Mathew Abraham ,2,6,7 1Princeton Neuroscience Institute 2Medical AI Research Center (Med ARC) 3Ecole Normale Supérieure, PSL University 4University of Toronto 5Hebrew University of Jerusalem 6Eleuther AI 7Stability AI
Pseudocode Yes Algorithm 1 Py Torch code for Mind Eye MLP backbone and MLP projector
Open Source Code Yes All code is available on Git Hub.
Open Datasets Yes For all experiments, we used the Natural Scenes Dataset (NSD) [26], a public f MRI dataset containing the brain responses of human participants passively viewing natural scenes from MS-COCO [27].
Dataset Splits Yes We used the same standardized train/test splits as other NSD reconstruction papers [3, 4, 28], training subject-specific models for each of 4 participants. We averaged across three same-image repetitions for the test set (leaving 982 test samples) but not the training set (24,980 training samples), similar to Takagi and Nishimoto [3]. For more information on NSD and data preprocessing see Appendix A.2; for single-trial and reduced dataset results see Appendix A.9 and Appendix A.10.
Hardware Specification Yes All our models are trained on a single A100 GPU for 240 epochs with a batch size of 32.
Software Dependencies No The paper mentions "Py Torch model code" in Algorithm 1, but it does not specify version numbers for PyTorch or any other software libraries used in the experiments.
Experiment Setup Yes We use α = 0.3 and switch from Bi Mix Co to Soft CLIP after one-third of the train cycle. All our models are trained on a single A100 GPU for 240 epochs with a batch size of 32. Despite a high parameter count, Mind Eye (including both highand low-level pipelines) can be trained on a single A100 in less than 18 hours. This efficiency is due to the bulk of the parameters stemming from MLPs, which are faster to compute than transformers or CNNs.