reproducibilityindex.ai

Brant: Foundation Model for Intracranial Neural Signal

Authors: Daoze Zhang, Zhizhang Yuan, YANG YANG, Junru Chen, Jingjing Wang, Yafeng Li

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that Brant generalizes well to various downstream tasks, showing the great potential in neural recordings modeling. Further analysis illustrates the effectiveness of large-scale pre-trained model, demonstrating the medical value of our work.
Researcher Affiliation	Collaboration	Daoze Zhang* Zhejiang University zhangdz@zju.edu.cn Zhizhang Yuan* Zhejiang University zhizhangyuan@zju.edu.cn Yang Yang Zhejiang University yangya@zju.edu.cn Junru Chen Zhejiang University jrchen_cali@zju.edu.cn Jingjing Wang Zhejiang University wjjxjj@zju.edu.cn Yafeng Li Nuozhu Technology Co., Ltd. yafeng.li@neurox.cn
Pseudocode	No	The paper describes the model architecture and training process in prose and diagrams (Figure 2) but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The source code and pre-trained weights are available at: https://zju-brainnet.github. io/Brant.github.io/.
Open Datasets	Yes	To further verify the generalization ability of Brant on more subjects with more heterogeneity, we evaluated the model on data of 31 unseen subjects from two public datasets named MAYO and FNUSA [42]. [42] Petr Nejedly, Vaclav Kremen, Vladimir Sladky, Jan Cimbalnik, Petr Klimes, Filip Plesinger, Filip Mivalt, Vojtech Travnicek, Ivo Viscor, et al. Multicenter intracranial eeg dataset for classification of graphoelements and artifactual signals. Scientific data, 7:179, 2020.
Dataset Splits	No	The paper describes splitting data for "fine-tuning" and "evaluation" (e.g., "320 minutes for fine-tuning and 80 minutes for evaluation" for signal forecasting) but does not explicitly mention a separate "validation" dataset split.
Hardware Specification	Yes	The model is pre-trained on a Linux system with 2 CPUs (AMD EPYC 9654 96-Core Processor) and 4 GPUs (NVIDIA Tesla A100 80G) for about 2.8 days.
Software Dependencies	No	The paper mentions using "Adam" optimizer and "mixed precision training with FP32 and BF16" but does not provide specific version numbers for any software dependencies like libraries or frameworks.
Experiment Setup	Yes	For the model configurations, the temporal encoder contains a 12-layer Transformer encoder with model dimension 2048, inner dimension (FFN) 3072 and 16 attention heads, and the spatial encoder contains a 5-layer Transformer encoder with model dimension 2048, inner dimension 3072 and 16 attention heads. During the pre-training, 40% patches in each input sample are masked with zero values uniformly at random. We take 16 input samples as a minibatch and each minibatch contains an average of 24k patches. We optimize with Adam [31], updating the model parameters every 4 steps, and the model trains for 750k updates in total. A cyclic scheduler that adopts a basic triangular cycle without amplitude scaling is utilized to adjust learning rate during pre-training. Specifically, we set the basic learning rate as 3 10 6 and the maximum learning rate as 1 10 5, then the learning rate steps up (down) for every 8k updates.