Deep Contextual Clinical Prediction with Reverse Distillation
Authors: Rohan Kodialam, Rebecca Boiarsky, Justin Lim, Aditya Sai, Neil Dixit, David Sontag249-258
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | SARD outperforms state-of-the-art methods on multiple clinical prediction outcomes, with ablation studies revealing that reverse distillation is a primary driver of these improvements. |
| Researcher Affiliation | Collaboration | 1MIT CSAIL & IMES 2Independence Blue Cross |
| Pseudocode | No | No pseudocode or algorithm blocks are provided; the architecture is illustrated in Figure 1. |
| Open Source Code | Yes | Code is available at https://github.com/clinicalml/omop-learn. |
| Open Datasets | No | OMOP provides a normalized concept vocabulary, and although our dataset is not public, hundreds of health institutions with data in an OMOP CDM can use our code out-of-the-box to reproduce results on local datasets |
| Dataset Splits | Yes | We split the 121, 593 patients into training, validation, and test sets of size 82, 955, 19, 319, and 19, 319 respectively. |
| Hardware Specification | Yes | We train using a single NVIDIA k80 GPU. |
| Software Dependencies | No | Our algorithms are implemented in Python 3.6 and use the PyTorch autograd library (Paszke et al. 2019). |
| Experiment Setup | Yes | We train our deep models using an ADAM optimizer (Kingma and Ba 2014) with the hyperparameter settings of β1 = 0.9, β2 = 0.98, ϵ = 10^-9 and a learning rate of η = 2 10^-4. A batch size of 500 patients was used for ADAM updates. SARD models are trained with de = 300 and K = 10; we found that validation performance did not increase with larger embedding sizes or number of convolutional kernels. We apply dropout with probability ρt d = 0.05 after each self-attention block to prevent overfitting. |