BrainSCUBA: Fine-Grained Natural Language Captions of Visual Cortex Selectivity

Authors: Andrew Luo, Margaret Marie Henderson, Michael J. Tarr, Leila Wehbe

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our method through fine-grained voxel-level captioning across higher-order visual regions. We further perform text-conditioned image synthesis with the captions, and show that our images are semantically coherent and yield high predicted activations. Finally, to demonstrate how our method enables scientific discovery, we perform exploratory investigations on the distribution of person representations in the brain, and discover fine-grained semantic selectivity in body-selective areas.
Researcher Affiliation Academia Andrew F. Luo Carnegie Mellon University afluo@cmu.edu Margaret M. Henderson Carnegie Mellon University mmhender@cmu.edu Michael J. Tarr Carnegie Mellon University michaeltarr@cmu.edu Leila Wehbe Carnegie Mellon University lwehbe@cmu.edu
Pseudocode No The paper describes its architecture and method through text and diagrams (Figure 1) but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Code and project site: https://www.cs.cmu.edu/~afluo/BrainSCUBA
Open Datasets Yes We utilize the Natural Scenes Dataset (NSD; Allen et al. (2022)), the largest whole-brain 7T human visual stimuli dataset.
Dataset Splits Yes The brain encoder is trained on the 9000 unique images for each subject, while the remaining 1000 images viewed by all are used to validate R2.
Hardware Specification Yes We perform our experiments on a mixture of Nvidia V100 (16GB and 32GB variants), 4090, and 2080 Ti cards.
Software Dependencies No The paper mentions software like 'pytorch', 'CLIPCap network', 'GPT-2', 'stable-diffusion-2-1-base', and 'DPM-Solver++', but it does not specify version numbers for these software components.
Experiment Setup Yes For the encoder training, we use the Adam optimizer with decoupled weight decay set to 2e 2. Initial learning rate is set to 3e 4 and decays exponentially to 1.5e 4 over the 100 training epochs. ... During softmax projection, we set the temperature parameter to 1/150. ... Captions are generated using beam search with a beam width of 5.