reproducibilityindex.ai

Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series

Authors: Aniruddh Raghu, Payal Chandak, Ridwan Alam, John Guttag, Collin Stultz

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on two real-world clinical datasets, where the time series contains sequences of (1) high-frequency electrocardiograms and (2) structured data from lab values and vitals signs. Our experimental results indicate that pre-training with our method and then fine-tuning on downstream tasks improves performance over baselines on both datasets, and in several settings, can lead to improvements across different self-supervised loss functions.
Researcher Affiliation	Academia	1Massachusetts Institute of Technology, Cambridge, MA, USA 2Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA, USA. Correspondence to: Aniruddh Raghu <araghu@mit.edu>.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code at https://github.com/aniruddhraghu/ smd-ssl.
Open Datasets	Yes	Dataset 2 is a public dataset derived from the commonly used MIMIC-III clinical database (Johnson et al., 2016; Goldberger et al., 2000) and its associated database of physiological signals (Moody et al., 2020)... The clinical database is available on Physio Net (Goldberger et al., 2000) to credentialed users. The database of physiological signals is open-access on Physio Net.
Dataset Splits	Yes	We split Dataset 1 on a per-patient level into 80/20 development/test sets and use 20% of the development set as a validation set. For Dataset 2, we use the predefined development/test split defined in the preprocessing pipeline (Harutyunyan et al., 2019), and use 20% of the development set (splitting on a per-patient basis) as a validation set.
Hardware Specification	Yes	All models were trained on either a single NVIDIA Quadro RTX 8000 or a single NVIDIA RTX A6000 GPU.
Software Dependencies	No	The paper mentions using the Adam optimizer but does not specify version numbers for any software libraries, frameworks, or programming languages used (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	We conduct pre-training for 15 epochs, using a batch size of 128, with the Adam optimizer (Kingma & Ba, 2014)... We found 1e-3 to be the most stable and best performing... We fixed the temperature of the NT-Xent loss to 0.1, following Yeche et al. (2021).