SMART: Towards Pre-trained Missing-Aware Model for Patient Health Status Prediction
Authors: Zhihao Yu, Chu Xu, Yujie Jin, Yasha Wang, Junfeng Zhao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the effectiveness of SMART through extensive experiments on six EHR tasks, demonstrating its superiority over state-of-the-art methods. |
| Researcher Affiliation | Academia | 1School of Computer Science, Peking University 2Center on Frontiers of Computing Studies, Peking University, Beijing, China 3National Research and Engineering Center of Software Engineering, Peking University 4Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing, China 5Peking University Information Technology Institute (Tianjin Binhai) 6Nanhu Laboratory, Jiaxing, China |
| Pseudocode | Yes | Algorithm 1 Algorithm of SMART |
| Open Source Code | Yes | Our code is available at https://github.com/yzhHoward/SMART. |
| Open Datasets | Yes | The Cardiology dataset can be obtained at https: //physionet.org/content/challenge-2012/1.0.0/. The Sepsis dataset can be obtained at https://physionet.org/content/challenge-2019/1.0.0/. The MIMIC-III dataset can be obtained at https://physionet.org/content/mimiciii/1.4/. |
| Dataset Splits | Yes | We randomly divide the data set into a training set containing 80% of the patients, a validation set of 10% patients, and a test set containing the remaining 10% instances. |
| Hardware Specification | Yes | All experiments of this model are carried out on a Linux server equipped with RTX 2080Ti GPUs. |
| Software Dependencies | Yes | Py Torch 2.1.2 and CUDA 12.1 deep learning libraries are applied to build and train our neural network. |
| Experiment Setup | Yes | The hidden size d of all modules in SMART is 32. The heads of the multi-head attention computation in temporal attention and variable attention are set to 4. The layer number L of the MART blocks is 2. The model is pre-trained for 25 epochs and fine-tuned for 25 epochs. The unfreeze epoch is set to 5. We employ the same hyperparameter configurations across the datasets in our experiments. The probability interval p for generating masks is set to (0, 0.75), and the dropout rate is set to 0.1. The exponential moving average (EMA) decay rate is set to 0.996. |