NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction

Authors: Zixuan Gong, Guangyin Bao, Qi Zhang, Zhongwei Wan, Duoqian Miao, Shoujin Wang, Lei Zhu, Changwei Wang, Rongtao Xu, Liang Hu, Ke Liu, Yu Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluated on a publicly available f MRI-video dataset, Neuro Clips achieves smooth high-fidelity video reconstruction of up to 6s at 8FPS, gaining significant improvements over state-of-the-art models in various metrics, e.g., a 128% improvement in SSIM and an 81% improvement in spatiotemporal metrics.
Researcher Affiliation Academia 1Tongji University 2Ohio State University 3University of Technology Sydney 4Chinese Academy of Sciences 5Beijing Anding Hospital
Pseudocode No The paper describes the methodology in text and mathematical formulas but does not include any clearly labeled “Pseudocode” or “Algorithm” blocks.
Open Source Code Yes Our project is available at https://github.com/gongzix/Neuro Clips.
Open Datasets Yes In this study, we performed f MRI-to-video reconstruction experiments using the open-source f MRI-video dataset (cc2017 dataset1) [31]. For each subject, the training and testing video clips were presented 2 and 10 times, respectively, and the testing set was averaged across trials. The dataset consists of a training set containing 18 8-minute video clips and a test set containing 5 8-minute video clips. The MRI (T1 and T2-weighted) and f MRI data (with 2s temporal resolution) were collected using a 3-T MRI system. Thus there are 8640 training samples and 1200 testing samples of f MRI-video pairs.1https://purr.purdue.edu/publications/2809/1
Dataset Splits No The dataset consists of a training set containing 18 8-minute video clips and a test set containing 5 8-minute video clips. The paper specifies 8640 training samples and 1200 testing samples, but does not explicitly mention a “validation” split.
Hardware Specification Yes All experiments were conducted using a single A100 GPU.
Software Dependencies Yes In the phase of inference, we use Animate Diff v3 [44] with Stable Diffusion v1.5 based motion module.
Experiment Setup Yes For Semantic Reconstructor, We first train the f MRI-to-keyframe alignment for 30 epochs with a batch size of 240 and then tune the Diffusion Prior for 150 epochs with a batch size of 64. For Perceptual Reconstructor, we train it for 150 epochs and the batch size is set to 40. We use the Adam W [64] for optimization, with a learning rate set to 3e-4, to which the One Circle learning rate schedule [65] was set. Mixing coefficients δ and µ are set to 30 and 1.