Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities
Authors: Jingyuan Sun, Mingxiao Li, Zijiao Chen, Yunhao Zhang, Shaonan Wang, Marie-Francine Moens
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate our model s superiority in generating high-resolution and semantically accurate images, substantially exceeding previous state-of-the-art methods by 39.34% in the 50-way-top-1 semantic classification accuracy. |
| Researcher Affiliation | Academia | Jingyuan Sun KU Leuven jingyuan.sun@kuleuven.be; Mingxiao Li KU Leuven mingxiao.li@kuleuven.be; Zijiao Chen National University of Singapore zijiao.chen@u.nus.edu; Yunhao Zhang Chinese Academy of Science zhangyunhao2021@ia.ac.cn; Shaonan Wang Chinese Academy of Science shaonan.wang@nlpr.ia.ac.cn; Marie-Francine Moens KU Leuven sien.moens@kuleuven.be |
| Pseudocode | Yes | Algorithm 1 Iterative Reasoning Module |
| Open Source Code | Yes | The code implementations will be available at https://github.com/soinx0629/vis_dec_neurips/. |
| Open Datasets | Yes | HCP The Human Connectome Project (HCP) originally serves as an extensive exploration into the connectivity of the human brain. It offers an open-sourced database of neuroimaging and behavioral data collected from 1,200 healthy young adults within the age range of 22-35 years. Currently, it stands as the largest public resource of MRI data pertaining to the human brain, providing an excellent foundation for the pre-training of brain activation pattern representations. GOD The Generic Object Decoding (GOD) Dataset is a specialized resource developed for f MRI-based decoding. BOLD5000 The BOLD5000 dataset is a result of an extensive slow event-related human brain f MRI study. It comprises 5,254 images, with 4,916 of them being unique. This makes it one of the most comprehensive publicly available datasets in the field. |
| Dataset Splits | Yes | The training session incorporated 1,200 images (8 per category from 150 distinct object categories). In contrast, the test session included 50 images (one from each of the 50 object categories). |
| Hardware Specification | Yes | We set the batch size to 250 and train for 140 epochs on one NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions software components and models like 'Vi T-based masked autoencoder (MAE)', 'LDM', and 'Adam W optimizer' but does not specify their version numbers or the versions of any underlying programming languages or libraries. |
| Experiment Setup | Yes | For both FRL Phase 1 and Phase 2, the f MRI auto-encoder is the same Vi T-based masked auto-encoder (MAE). We employed an asymmetric architecture for the f MRI auto-encoder, in which the decoder is considerably smaller with 8 layers than the encoder with 24 layers. We used a larger embedding-to-patch size ratio, specifically a patch size of 16 and an embedding dimension of 1024 for our model. We used random sparsification (RS) as a form of data augmentation, randomly selecting and setting 20% of voxels in each f MRI to zero. FRL Phase 1: For GOD subject 1,4,5 and BOLD5000 CSI 1,2, self-contrastive (γs) and cross-contrastive (γc) loss weights are both 1. The masking ratio is 0.5. For GOD subject 2,3 and BOLD5000 CSI 3,4, γs = 1 and γc = 0.5, masking ratio is 0.75. We set the batch size to 250 and train for 140 epochs on one NVIDIA A100 GPU. We train with 20-epoch warming up and an initial learning rate of 2.5e-4. We optimize with Adam W and weight decay 0.05. FRL Phase 2: We set the batch size to be 16 and train for 60 epochs. We train with 2-epoch warming up. The initial learning rate is 5.3e-5. We optimize with Adam W and weight decay 0.05. Fine-tuning LDM: We conduct training with the following parameters: the batch size of 5, diffusion steps of 1000, the Adam W optimizer, a learning rate of 5.5e 5, and an image resolution of 256 256 3. |