Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities
Authors: Jingyuan Sun, Mingxiao Li, Zijiao Chen, Yunhao Zhang, Shaonan Wang, Marie-Francine Moens
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate our model s superiority in generating high-resolution and semantically accurate images, substantially exceeding previous state-of-the-art methods by 39.34% in the 50-way-top-1 semantic classification accuracy. |
| Researcher Affiliation | Academia | Jingyuan Sun KU Leuven EMAIL; Mingxiao Li KU Leuven EMAIL; Zijiao Chen National University of Singapore EMAIL; Yunhao Zhang Chinese Academy of Science EMAIL; Shaonan Wang Chinese Academy of Science EMAIL; Marie-Francine Moens KU Leuven EMAIL |
| Pseudocode | Yes | Algorithm 1 Iterative Reasoning Module |
| Open Source Code | Yes | The code implementations will be available at https://github.com/soinx0629/vis_dec_neurips/. |
| Open Datasets | Yes | HCP The Human Connectome Project (HCP) originally serves as an extensive exploration into the connectivity of the human brain. It offers an open-sourced database of neuroimaging and behavioral data collected from 1,200 healthy young adults within the age range of 22-35 years. Currently, it stands as the largest public resource of MRI data pertaining to the human brain, providing an excellent foundation for the pre-training of brain activation pattern representations. GOD The Generic Object Decoding (GOD) Dataset is a specialized resource developed for f MRI-based decoding. BOLD5000 The BOLD5000 dataset is a result of an extensive slow event-related human brain f MRI study. It comprises 5,254 images, with 4,916 of them being unique. This makes it one of the most comprehensive publicly available datasets in the field. |
| Dataset Splits | Yes | The training session incorporated 1,200 images (8 per category from 150 distinct object categories). In contrast, the test session included 50 images (one from each of the 50 object categories). |
| Hardware Specification | Yes | We set the batch size to 250 and train for 140 epochs on one NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions software components and models like 'Vi T-based masked autoencoder (MAE)', 'LDM', and 'Adam W optimizer' but does not specify their version numbers or the versions of any underlying programming languages or libraries. |
| Experiment Setup | Yes | For both FRL Phase 1 and Phase 2, the f MRI auto-encoder is the same Vi T-based masked auto-encoder (MAE). We employed an asymmetric architecture for the f MRI auto-encoder, in which the decoder is considerably smaller with 8 layers than the encoder with 24 layers. We used a larger embedding-to-patch size ratio, specifically a patch size of 16 and an embedding dimension of 1024 for our model. We used random sparsification (RS) as a form of data augmentation, randomly selecting and setting 20% of voxels in each f MRI to zero. FRL Phase 1: For GOD subject 1,4,5 and BOLD5000 CSI 1,2, self-contrastive (γs) and cross-contrastive (γc) loss weights are both 1. The masking ratio is 0.5. For GOD subject 2,3 and BOLD5000 CSI 3,4, γs = 1 and γc = 0.5, masking ratio is 0.75. We set the batch size to 250 and train for 140 epochs on one NVIDIA A100 GPU. We train with 20-epoch warming up and an initial learning rate of 2.5e-4. We optimize with Adam W and weight decay 0.05. FRL Phase 2: We set the batch size to be 16 and train for 60 epochs. We train with 2-epoch warming up. The initial learning rate is 5.3e-5. We optimize with Adam W and weight decay 0.05. Fine-tuning LDM: We conduct training with the following parameters: the batch size of 5, diffusion steps of 1000, the Adam W optimizer, a learning rate of 5.5e 5, and an image resolution of 256 256 3. |