reproducibilityindex.ai

DDL: Deep Dictionary Learning for Predictive Phenotyping

Authors: Tianfan Fu, Trong Nghia Hoang, Cao Xiao, Jimeng Sun

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evaluations on multiple EHR datasets demonstrated that DDL outperforms the existing predictive phenotyping methods on a wide variety of clinical tasks that require patient phenotyping.
Researcher Affiliation	Collaboration	1 Department of Computational Science and Engineering, Georgia Institute of Technology 2MIT-IBM Watson AI Lab, IBM Research 3 Analytics Center of Excellence, IQVIA
Pseudocode	Yes	Algorithm 1 DDL (Tmax max no. of optimizing iterations)
Open Source Code	Yes	2Code is available at https://github.com/futianfan/dictionary.
Open Datasets	Yes	on 3 healthcare datasets, Heart Failure (HF) [Ma et al., 2018], MIMIC-III and a subset of Truven Market Scan Data1, which contain 16794, 58000 and 72179 EHR samples, respectively.
Dataset Splits	Yes	For each experiment, we randomly generate 5 different partitions of the entire dataset into training, validation and testing sets with a 7 : 1 : 2 ratio.
Hardware Specification	Yes	Our method is implemented by Tensorﬂow 1.9.0 and Python 3.52; and tested on an Intel Xeon E5-2690 machine with 256G RAM and 8 NVIDIA Pascal Titan X GPUs.
Software Dependencies	Yes	Our method is implemented by Tensorﬂow 1.9.0 and Python 3.52
Experiment Setup	Yes	For HF dataset, the no. of hidden units of DDL s RNN component is set to 100. Its dictionary size is set to 10. Its learning rate for gradient back-propagation on the aggregate loss (Eq. 7) is set to be 1e 2. To trade-off between projection, reconstruction and prediction losses, we set ηd = 1e 1, ηc = 1 and ηr = 1e 3 in Eq. 7. For the projection loss Ld in Eq. 2, the regularization hyper-parameters are set as λ1 = 5e 2 and λ2 = 1e 3. For MIMIC-III dataset, we use the same conﬁguration but with the following minor changes on learning rate (5e 2) and trade-off coefﬁcients (ηd = ηc = 1 and ηr = 2e 3) between individual losses of DDL. For TRUVEN dataset, we also use the similar conﬁguration but with the dictionary and RNN component s hidden sizes set to be 15 and 200, respectively. In addition, the trade-off coefﬁcients in Eq. 7 are also adjusted to ηd = 1e 1 and ηc = ηr = 1. The batch sizes of DDL s stochastic gradient descent on HF and MIMIC-III are both set to be 32, while on TRUVEN, it is set to be 64 (since TRUVEN dataset is larger than the others).