NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction
Authors: Zixuan Gong, Guangyin Bao, Qi Zhang, Zhongwei Wan, Duoqian Miao, Shoujin Wang, Lei Zhu, Changwei Wang, Rongtao Xu, Liang Hu, Ke Liu, Yu Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluated on a publicly available f MRI-video dataset, Neuro Clips achieves smooth high-fidelity video reconstruction of up to 6s at 8FPS, gaining significant improvements over state-of-the-art models in various metrics, e.g., a 128% improvement in SSIM and an 81% improvement in spatiotemporal metrics. |
| Researcher Affiliation | Academia | 1Tongji University 2Ohio State University 3University of Technology Sydney 4Chinese Academy of Sciences 5Beijing Anding Hospital |
| Pseudocode | No | The paper describes the methodology in text and mathematical formulas but does not include any clearly labeled “Pseudocode” or “Algorithm” blocks. |
| Open Source Code | Yes | Our project is available at https://github.com/gongzix/Neuro Clips. |
| Open Datasets | Yes | In this study, we performed f MRI-to-video reconstruction experiments using the open-source f MRI-video dataset (cc2017 dataset1) [31]. For each subject, the training and testing video clips were presented 2 and 10 times, respectively, and the testing set was averaged across trials. The dataset consists of a training set containing 18 8-minute video clips and a test set containing 5 8-minute video clips. The MRI (T1 and T2-weighted) and f MRI data (with 2s temporal resolution) were collected using a 3-T MRI system. Thus there are 8640 training samples and 1200 testing samples of f MRI-video pairs.1https://purr.purdue.edu/publications/2809/1 |
| Dataset Splits | No | The dataset consists of a training set containing 18 8-minute video clips and a test set containing 5 8-minute video clips. The paper specifies 8640 training samples and 1200 testing samples, but does not explicitly mention a “validation” split. |
| Hardware Specification | Yes | All experiments were conducted using a single A100 GPU. |
| Software Dependencies | Yes | In the phase of inference, we use Animate Diff v3 [44] with Stable Diffusion v1.5 based motion module. |
| Experiment Setup | Yes | For Semantic Reconstructor, We first train the f MRI-to-keyframe alignment for 30 epochs with a batch size of 240 and then tune the Diffusion Prior for 150 epochs with a batch size of 64. For Perceptual Reconstructor, we train it for 150 epochs and the batch size is set to 40. We use the Adam W [64] for optimization, with a learning rate set to 3e-4, to which the One Circle learning rate schedule [65] was set. Mixing coefficients δ and µ are set to 30 and 1. |