Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity
Authors: Zijiao Chen, Jiaxin Qing, Juan Helen Zhou
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The recovered videos were evaluated with various semantic and pixel-level metrics. We achieved an average accuracy of 85% in semantic classification tasks and 0.19 in structural similarity index (SSIM), outperforming the previous state-of-the-art by 45%. We also show that our model is biologically plausible and interpretable, reflecting established physiological processes. |
| Researcher Affiliation | Academia | Zijiao Chen National University of Singapore zijiao.chen@u.nus.edu Jiaxin Qing The Chinese University of Hong Kong jqing@ie.cuhk.edu.hk Juan Helen Zhou National University of Singapore helen.zhou@nus.edu.sg |
| Pseudocode | No | The paper does not include a figure, block, or section explicitly labeled “Pseudocode” or “Algorithm” with structured steps. |
| Open Source Code | No | The paper provides a project website (https://mind-video.com) which states that GitHub code is “Coming soon.” This indicates the code was not available at the time of publication or analysis, and no direct repository link is provided. |
| Open Datasets | Yes | Pre-training dataset Human Connectome Project (HCP) 1200 Subject Release [28]: For our upstream pre-training dataset, we employed resting-state and task-evoked f MRI data from the HCP. [...] A publicly available benchmark f MRI-video dataset [11] was used, comprising f MRI and video clips. |
| Dataset Splits | No | The paper defines 'training data' and 'test data' with specific sizes and content. However, it does not explicitly mention a separate 'validation' dataset or split for hyperparameter tuning or early stopping during training. |
| Hardware Specification | Yes | All parameters in the f MRI encoder pre-training are the same as [7] with eight RTX3090, while other stages are trained with one RTX3090. |
| Software Dependencies | No | The paper mentions various software components and models (e.g., 'Vi T-based f MRI encoder', 'Stable Diffusion V1-5', 'CLIP', 'BLIP', 'DDIM'), but it does not specify version numbers for any of these, which is required for reproducible software dependency information. |
| Experiment Setup | Yes | The original videos are downsampled from 30 FPS to 3 FPS for efficient training and testing, leading to 6 frames per f MRI frame. [...] A Vi T-based f MRI encoder with a patch size of 16, a depth of 24, and an embedding dimension of 1024 is used. [...] But we tune the augmented Stable Diffusion for video generations at the resolution of 256 256 with 3 FPS. [...] The inference is performed with 200 DDIM [34] steps. (Further details in Supplementary Material B, including Table B.1 with hyperparameters such as learning rate, batch size, epochs, and optimizer). |