Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities

Authors: Jingyuan Sun, Mingxiao Li, Zijiao Chen, Yunhao Zhang, Shaonan Wang, Marie-Francine Moens

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate our model s superiority in generating high-resolution and semantically accurate images, substantially exceeding previous state-of-the-art methods by 39.34% in the 50-way-top-1 semantic classification accuracy.
Researcher Affiliation Academia Jingyuan Sun KU Leuven jingyuan.sun@kuleuven.be; Mingxiao Li KU Leuven mingxiao.li@kuleuven.be; Zijiao Chen National University of Singapore zijiao.chen@u.nus.edu; Yunhao Zhang Chinese Academy of Science zhangyunhao2021@ia.ac.cn; Shaonan Wang Chinese Academy of Science shaonan.wang@nlpr.ia.ac.cn; Marie-Francine Moens KU Leuven sien.moens@kuleuven.be
Pseudocode Yes Algorithm 1 Iterative Reasoning Module
Open Source Code Yes The code implementations will be available at https://github.com/soinx0629/vis_dec_neurips/.
Open Datasets Yes HCP The Human Connectome Project (HCP) originally serves as an extensive exploration into the connectivity of the human brain. It offers an open-sourced database of neuroimaging and behavioral data collected from 1,200 healthy young adults within the age range of 22-35 years. Currently, it stands as the largest public resource of MRI data pertaining to the human brain, providing an excellent foundation for the pre-training of brain activation pattern representations. GOD The Generic Object Decoding (GOD) Dataset is a specialized resource developed for f MRI-based decoding. BOLD5000 The BOLD5000 dataset is a result of an extensive slow event-related human brain f MRI study. It comprises 5,254 images, with 4,916 of them being unique. This makes it one of the most comprehensive publicly available datasets in the field.
Dataset Splits Yes The training session incorporated 1,200 images (8 per category from 150 distinct object categories). In contrast, the test session included 50 images (one from each of the 50 object categories).
Hardware Specification Yes We set the batch size to 250 and train for 140 epochs on one NVIDIA A100 GPU.
Software Dependencies No The paper mentions software components and models like 'Vi T-based masked autoencoder (MAE)', 'LDM', and 'Adam W optimizer' but does not specify their version numbers or the versions of any underlying programming languages or libraries.
Experiment Setup Yes For both FRL Phase 1 and Phase 2, the f MRI auto-encoder is the same Vi T-based masked auto-encoder (MAE). We employed an asymmetric architecture for the f MRI auto-encoder, in which the decoder is considerably smaller with 8 layers than the encoder with 24 layers. We used a larger embedding-to-patch size ratio, specifically a patch size of 16 and an embedding dimension of 1024 for our model. We used random sparsification (RS) as a form of data augmentation, randomly selecting and setting 20% of voxels in each f MRI to zero. FRL Phase 1: For GOD subject 1,4,5 and BOLD5000 CSI 1,2, self-contrastive (γs) and cross-contrastive (γc) loss weights are both 1. The masking ratio is 0.5. For GOD subject 2,3 and BOLD5000 CSI 3,4, γs = 1 and γc = 0.5, masking ratio is 0.75. We set the batch size to 250 and train for 140 epochs on one NVIDIA A100 GPU. We train with 20-epoch warming up and an initial learning rate of 2.5e-4. We optimize with Adam W and weight decay 0.05. FRL Phase 2: We set the batch size to be 16 and train for 60 epochs. We train with 2-epoch warming up. The initial learning rate is 5.3e-5. We optimize with Adam W and weight decay 0.05. Fine-tuning LDM: We conduct training with the following parameters: the batch size of 5, diffusion steps of 1000, the Adam W optimizer, a learning rate of 5.5e 5, and an image resolution of 256 256 3.