Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

BrainOmni: A Brain Foundation Model for Unified EEG and MEG Signals

Authors: Qinfan Xiao, Ziyun Cui, Chi Zhang, SiQi Chen, Wen Wu, Andrew Thwaites, Alexandra Woolgar, Bowen Zhou, Chao Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that Brain Omni outperforms both existing foundation models and state-of-the-art task-specific models on a range of downstream tasks. It also demonstrates strong generalisation to unseen EEG and MEG devices. Further analysis reveals that joint EEG-MEG (EMEG) training yields consistent improvements across both modalities. Code and checkpoints are publicly available at https://github.com/Open TSLab/ Brain Omni. 3 Experimental Setup 4.1 Downstream Evaluation 4.2 Cross Device Generalisation 4.3 EMEG Joint Pretraining
Researcher Affiliation Collaboration 1 Shanghai Artificial Intelligence Laboratory, China 2 Department of Electronic Engineering, Tsinghua University, China 3 Department of Psychology, University of Cambridge, UK 4 Speech Hearing and Phonetic Sciences, University College London, UK 5 MRC Cognition and Brain Sciences Unit, University of Cambridge, UK.
Pseudocode Yes Algorithm 1: Power-spectral-density-based bad-channel detection Input :raw_data, threshold = 10 Output :bad_channels 1. Compute per-channel PSD over the full recording: PSD COMPUTEPSD(raw_data).data 2. Stabilise and log-transform: L log(PSD + 10 16) 3. Compute pairwise distances between channel spectra: for i 1 to C do for j 1 to C do dist[i, j] L[i] L[j] end end 4. Mean distance per channel: m[i] meanj dist[i, j] for i = 1, . . . , C 5. Identify outliers via IQR: Q1 percentile25(m), Q3 percentile75(m) IQR Q3 Q1 upper Q3 + threshold IQR, lower Q1 threshold IQR bad_channels { chi | m[i] > upper m[i] < lower }
Open Source Code Yes Code and checkpoints are publicly available at https://github.com/Open TSLab/ Brain Omni.
Open Datasets Yes A total of 1,997 hours of EEG and 656 hours of MEG data are curated and standardised from publicly available sources for pretraining.
Dataset Splits Yes Among them, 85% of the data is used for training, 10% of the data is used for validation, and 5% of the data is used for testing. Additionally, one EEG dataset and one MEG dataset, which were both collected with unique device systems different from those in the training data, were excluded from the training data to evaluate the model s cross-device generalisation ability. All experiments were conducted under a 5-fold cross-validation setup, where each split allocated three folds for training, one for validation, and one for testing. Each configuration was run under two random seeds. To evaluate the model s generalisation across subjects, a strict cross-subject split strategy was applied to all datasets where subjects in the training set do not appear in the validation or test sets.
Hardware Specification Yes The training was conducted on 16 A100 GPUs, using the Adam W optimizer and a warmup-cosine-decay learning rate scheduler with a warmup proportion of 10%. Brain Tokenizer was trained for 16 epochs with a total batch size of 512 per update step and a maximum learning rate of 2e-4. Brain Omni was trained for 32 epochs with a total batch size of 256 per update step and a maximum learning rate of 4e-4. The Brain Tokenizer training took approximately 11 hours, and Brain Omni required about 14 hours for the tiny model and 18 hours for the base model. On a single A100 GPU, the training throughput was approximately 60 samples/sec, while inference achieved roughly 90 samples/sec.
Software Dependencies No The training was conducted on 16 A100 GPUs, using the Adam W optimizer and a warmup-cosine-decay learning rate scheduler with a warmup proportion of 10%. Brain Tokenizer was trained for 16 epochs with a total batch size of 512 per update step and a maximum learning rate of 2e-4. Brain Omni was trained for 32 epochs with a total batch size of 256 per update step and a maximum learning rate of 4e-4.
Experiment Setup Yes We trained Brain Tokenizer using 2-second segments. For training Brain Omni, we inputted 30-second data segments to allow the model to capture longer temporal dependencies. During the segmented tokenization process, we set the overlap ratio between windows to 25% to incorporate partial contextual information. The training was conducted on 16 A100 GPUs, using the Adam W optimizer and a warmup-cosine-decay learning rate scheduler with a warmup proportion of 10%. Brain Tokenizer was trained for 16 epochs with a total batch size of 512 per update step and a maximum learning rate of 2e-4. Brain Omni was trained for 32 epochs with a total batch size of 256 per update step and a maximum learning rate of 4e-4. The Brain Tokenizer training took approximately 11 hours, and Brain Omni required about 14 hours for the tiny model and 18 hours for the base model. For downstream evaluation, all models follow a unified training pipeline. The output embeddings from each model are first average pooled along the temporal dimension, then flattened across remaining dimensions to serve as feature, which are subsequently fed into a two-layer MLP for classification. Hyperparameters for Brain Tokenizer training (Table 10): Window length 512, N filters 32, Ratios [8, 4, 2], Kernel size 5, Last Kernel size 5, Hidden dim 256, Codebook dim 256, Codebook size 512, Num quantizers 4, Rotation trick True, Latent source number 16, Attention head number 4, Dropout 0.0, Total batch per update 512, Weight decay 1e-2, Lr 2e-4, Epochs 16, Optimizer Type Adam W, Betas [0.5, 0.9], Eps 1e-5, Scheduler Type Warmup Cosine LR, Warmup ratio 0.1, Cos min ratio 0.05. Hyperparameters for Brain Omni training (Table 11): Brain Omni-tiny Hidden dim 256, Attention head number 8, Attention depth 12, Lr 5e-4, Overlap ratio 0.25, Dropout 0.1, Mask ratio 0.5; Brain Omni-base Hidden dim 512, Attention head number 16, Attention depth 12, Lr 4e-4, Overlap ratio 0.25, Dropout 0.1, Mask ratio 0.5. Common for both: Total batch per update 256, Epochs 32, Weight decay 5e-2, Optimizer Type Adam W, Betas [0.9, 0.95], Eps 1e-6, Scheduler Type Warmup Cosine LR, Warmup ratio 0.1, Cos min ratio 0.1. Hyperparameters for downstream finetuning (Table 12): Total batch per update 128, Epochs 30, Weight decay 5e-2, Label smoothing 0.1, Optimizer Type Adam W, Betas [0.9, 0.99], Eps 1e-6, Scheduler Type Warmup Cosine LR, Warmup ratio 0.1, Cos min ratio 0.1.