Brant: Foundation Model for Intracranial Neural Signal
Authors: Daoze Zhang, Zhizhang Yuan, YANG YANG, Junru Chen, Jingjing Wang, Yafeng Li
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that Brant generalizes well to various downstream tasks, showing the great potential in neural recordings modeling. Further analysis illustrates the effectiveness of large-scale pre-trained model, demonstrating the medical value of our work. |
| Researcher Affiliation | Collaboration | Daoze Zhang* Zhejiang University zhangdz@zju.edu.cn Zhizhang Yuan* Zhejiang University zhizhangyuan@zju.edu.cn Yang Yang Zhejiang University yangya@zju.edu.cn Junru Chen Zhejiang University jrchen_cali@zju.edu.cn Jingjing Wang Zhejiang University wjjxjj@zju.edu.cn Yafeng Li Nuozhu Technology Co., Ltd. yafeng.li@neurox.cn |
| Pseudocode | No | The paper describes the model architecture and training process in prose and diagrams (Figure 2) but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code and pre-trained weights are available at: https://zju-brainnet.github. io/Brant.github.io/. |
| Open Datasets | Yes | To further verify the generalization ability of Brant on more subjects with more heterogeneity, we evaluated the model on data of 31 unseen subjects from two public datasets named MAYO and FNUSA [42]. [42] Petr Nejedly, Vaclav Kremen, Vladimir Sladky, Jan Cimbalnik, Petr Klimes, Filip Plesinger, Filip Mivalt, Vojtech Travnicek, Ivo Viscor, et al. Multicenter intracranial eeg dataset for classification of graphoelements and artifactual signals. Scientific data, 7:179, 2020. |
| Dataset Splits | No | The paper describes splitting data for "fine-tuning" and "evaluation" (e.g., "320 minutes for fine-tuning and 80 minutes for evaluation" for signal forecasting) but does not explicitly mention a separate "validation" dataset split. |
| Hardware Specification | Yes | The model is pre-trained on a Linux system with 2 CPUs (AMD EPYC 9654 96-Core Processor) and 4 GPUs (NVIDIA Tesla A100 80G) for about 2.8 days. |
| Software Dependencies | No | The paper mentions using "Adam" optimizer and "mixed precision training with FP32 and BF16" but does not provide specific version numbers for any software dependencies like libraries or frameworks. |
| Experiment Setup | Yes | For the model configurations, the temporal encoder contains a 12-layer Transformer encoder with model dimension 2048, inner dimension (FFN) 3072 and 16 attention heads, and the spatial encoder contains a 5-layer Transformer encoder with model dimension 2048, inner dimension 3072 and 16 attention heads. During the pre-training, 40% patches in each input sample are masked with zero values uniformly at random. We take 16 input samples as a minibatch and each minibatch contains an average of 24k patches. We optimize with Adam [31], updating the model parameters every 4 steps, and the model trains for 750k updates in total. A cyclic scheduler that adopts a basic triangular cycle without amplitude scaling is utilized to adjust learning rate during pre-training. Specifically, we set the basic learning rate as 3 10 6 and the maximum learning rate as 1 10 5, then the learning rate steps up (down) for every 8k updates. |