Deep Contextual Clinical Prediction with Reverse Distillation

Authors: Rohan Kodialam, Rebecca Boiarsky, Justin Lim, Aditya Sai, Neil Dixit, David Sontag249-258

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental SARD outperforms state-of-the-art methods on multiple clinical prediction outcomes, with ablation studies revealing that reverse distillation is a primary driver of these improvements.
Researcher Affiliation Collaboration 1MIT CSAIL & IMES 2Independence Blue Cross
Pseudocode No No pseudocode or algorithm blocks are provided; the architecture is illustrated in Figure 1.
Open Source Code Yes Code is available at https://github.com/clinicalml/omop-learn.
Open Datasets No OMOP provides a normalized concept vocabulary, and although our dataset is not public, hundreds of health institutions with data in an OMOP CDM can use our code out-of-the-box to reproduce results on local datasets
Dataset Splits Yes We split the 121, 593 patients into training, validation, and test sets of size 82, 955, 19, 319, and 19, 319 respectively.
Hardware Specification Yes We train using a single NVIDIA k80 GPU.
Software Dependencies No Our algorithms are implemented in Python 3.6 and use the PyTorch autograd library (Paszke et al. 2019).
Experiment Setup Yes We train our deep models using an ADAM optimizer (Kingma and Ba 2014) with the hyperparameter settings of β1 = 0.9, β2 = 0.98, ϵ = 10^-9 and a learning rate of η = 2 10^-4. A batch size of 500 patients was used for ADAM updates. SARD models are trained with de = 300 and K = 10; we found that validation performance did not increase with larger embedding sizes or number of convolutional kernels. We apply dropout with probability ρt d = 0.05 after each self-attention block to prevent overfitting.