Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity

Authors: Zijiao Chen, Jiaxin Qing, Juan Helen Zhou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The recovered videos were evaluated with various semantic and pixel-level metrics. We achieved an average accuracy of 85% in semantic classification tasks and 0.19 in structural similarity index (SSIM), outperforming the previous state-of-the-art by 45%. We also show that our model is biologically plausible and interpretable, reflecting established physiological processes.
Researcher Affiliation Academia Zijiao Chen National University of Singapore zijiao.chen@u.nus.edu Jiaxin Qing The Chinese University of Hong Kong jqing@ie.cuhk.edu.hk Juan Helen Zhou National University of Singapore helen.zhou@nus.edu.sg
Pseudocode No The paper does not include a figure, block, or section explicitly labeled “Pseudocode” or “Algorithm” with structured steps.
Open Source Code No The paper provides a project website (https://mind-video.com) which states that GitHub code is “Coming soon.” This indicates the code was not available at the time of publication or analysis, and no direct repository link is provided.
Open Datasets Yes Pre-training dataset Human Connectome Project (HCP) 1200 Subject Release [28]: For our upstream pre-training dataset, we employed resting-state and task-evoked f MRI data from the HCP. [...] A publicly available benchmark f MRI-video dataset [11] was used, comprising f MRI and video clips.
Dataset Splits No The paper defines 'training data' and 'test data' with specific sizes and content. However, it does not explicitly mention a separate 'validation' dataset or split for hyperparameter tuning or early stopping during training.
Hardware Specification Yes All parameters in the f MRI encoder pre-training are the same as [7] with eight RTX3090, while other stages are trained with one RTX3090.
Software Dependencies No The paper mentions various software components and models (e.g., 'Vi T-based f MRI encoder', 'Stable Diffusion V1-5', 'CLIP', 'BLIP', 'DDIM'), but it does not specify version numbers for any of these, which is required for reproducible software dependency information.
Experiment Setup Yes The original videos are downsampled from 30 FPS to 3 FPS for efficient training and testing, leading to 6 frames per f MRI frame. [...] A Vi T-based f MRI encoder with a patch size of 16, a depth of 24, and an embedding dimension of 1024 is used. [...] But we tune the augmented Stable Diffusion for video generations at the resolution of 256 256 with 3 FPS. [...] The inference is performed with 200 DDIM [34] steps. (Further details in Supplementary Material B, including Table B.1 with hyperparameters such as learning rate, batch size, epochs, and optimizer).