Large-scale Training of Foundation Models for Wearable Biosignals

Authors: Salar Abbaspourazad, Oussama Elachqar, Andrew Miller, Saba Emrani, Udhyakumar Nallasamy, Ian Shapiro

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We curated PPG and ECG datasets from AHMS that include data from 141K participants spanning 3 years. Our self-supervised learning framework includes participant level positive pair selection, stochastic augmentation module and a regularized contrastive loss optimized with momentum training, and generalizes well to both PPG and ECG modalities. We show that the pre-trained foundation models readily encode information regarding participants demographics and health conditions. To the best of our knowledge, this is the first study that builds foundation models using large-scale PPG and ECG data collected via wearable consumer devices prior works have commonly used smaller-size datasets collected in clinical and experimental settings.
Researcher Affiliation Industry Salar Abbaspourazad , Oussama Elachqar, Andrew C. Miller, Saba Emrani, Udhyakumar Nallasamy, Ian Shapiro Apple
Pseudocode Yes Pseudo-code of our pre-training framework is in Algorithm 1 and it is visually shown in Fig. 3.
Open Source Code No Similarly, code for all data analyses may be available upon request from the corresponding author. Requests for code will be evaluated and responded to in a manner consistent with policies intended to protect participant confidentiality and language in the study protocol and ICF.
Open Datasets No Based on the language within the IRB approved ICF for the Apple Heart and Movement Study, we are unable to share sensor data collected in the study.
Dataset Splits Yes We used the data from 80% of the participants for training, 10% for during training validation, and 10% for occasional post-training assessments.
Hardware Specification Yes The models were trained using Adam optimizer with gradient descent updates distributed across 32 A100 GPUs.
Software Dependencies No The paper mentions "Adam optimizer" and the pseudocode is in "PyTorch-like style" but does not specify version numbers for any software dependencies like PyTorch, Python, or CUDA.
Experiment Setup Yes Our default encoder is an Efficient Net-style 1D convolutional neural network (Ablation 5.2.4) with 16 mobile-inverted bottleneck blocks with squeeze-and-excitation (Tan & Le, 2020) and we used 256-dimensional embedding (the representation vector after the deep encoder) for all models used in this study. The encoder had 3.3M parameters for PPG and 2.5M for ECG. The projection head was a multi-layer perceptron with one hidden layer of 1024 units, taking the 256-dimensional embedding to a 128-dimensional representation subspace where the loss is calculated in. For Info NCE, we used temperature value of 0.04 for both PPG and ECG modalities, and the weight for the Ko Leo regularization in our objective function was set to 0.1. The models were trained using Adam optimizer with gradient descent updates distributed across 32 A100 GPUs. Other implementation details is in Appendix A.1. Throughout this study, we used constant momentum update rate of 0.99 for our pre-training framework and BYOL. For all model training in this study (unless otherwise stated), we used a batch size of 256, and the initial learning rate was 0.001 (0.00025 for BYOL), and we used step learning rate scheduling for a faster convergence.