BIOT: Biosignal Transformer for Cross-data Learning in the Wild

Authors: Chaoqi Yang, M Westover, Jimeng Sun

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive evaluations on EEG, electrocardiogram (ECG), and human activity sensory signals demonstrate that BIOT outperforms robust baselines in common settings and facilitates learning across multiple datasets with different formats. 3 Experiments This section shows the strong performance of BIOT on several EEG, ECG, and human sensory datasets.
Researcher Affiliation Academia Chaoqi Yang1, M. Brandon Westover2,3, Jimeng Sun1 1University of Illinois Urbana-Champaign, 2Harvard Medical School 3Beth Israel Deaconess Medical Center {chaoqiy2}@illinois.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our repository is public at https://github.com/ycq091044/BIOT.
Open Datasets Yes We consider the following datasets in the evaluation: (i) SHHS (Zhang et al., 2018; Quan et al., 1997)... (iv) The CHB-MIT database (Shoeb, 2009)... (v) IIIC Seizure dataset is from Ge et al. (2021); Jing et al. (2023)... (vi) TUH Abnormal EEG Corpus (TUAB) (Lopez et al., 2015)... (vii) TUH EEG Events (TUEV) (Harati et al., 2015)... (viii) PTB-XL (Wagner et al., 2020)... (ix) HAR (Anguita et al., 2013)... The CHB-MIT database 4 (Shoeb, 2009) is publicly available... TUH Abnormal EEG Corpus (TUAB) (Lopez et al., 2015) and TUH EEG Events (TUEV) (Harati et al., 2015) is accessible upon request at Temple University Electroencephalography (EEG) Resources 7. Physikalisch-Technische Bundesanstalt (PTB-XL) 10 (Wagner et al., 2020) is a publicly available large dataset... Human activity recognition (HAR) dataset 12 (Anguita et al., 2013) is publicly available at UCI machine learning repository.
Dataset Splits Yes For CHB-MIT (containing 23 patients), we first use patient 1 to 19 for training, 20,21 for validation, and 22,23 for test. Then, we flip the validation and test sets and conduct the experiments again, and we report the average performance on these two settings. For IIIC seizure, we divide patient groups into training/validation/test sets by 60%:20%:20%. For TUAB and TUEV, the training and test separation is provided by the dataset. We further divide the training patients into training and validation groups by 80%:20%. For PTB-XL, we divide patient groups into training/validation/test sets by 80%:10%:10%. The train and test set of HAR is provided, and we further divide the test patients into validation/test by 50%:50%.
Hardware Specification Yes The experiments are implemented by Python 3.9.12, Torch 1.13.1+cu117, Pytorch-lightning 1.6.4 on a Linux server with 512 GB memory, 128-core CPUs and eight RTX A6000 GPUs.
Software Dependencies Yes The experiments are implemented by Python 3.9.12, Torch 1.13.1+cu117, Pytorch-lightning 1.6.4
Experiment Setup Yes For our BIOT model, we use 8 as the number of head, 4 as the number of transformer layers, and T = 2 as the temperature in unsupervised pre-training by default. We use the Adam optimizer with learning rate 1 × 10−3 and 1 × 10−5 as the coefficient for L2 regularization by default. We use the pytorch lightning framework (with 100 as the max epoch) to handle the training, validation, and test pipeline by setting AUROC as the monitoring metirc for binary classification and Coken’s Kappa as the monitoring metric for multi-class classification in the validation.