Gradient Importance Learning for Incomplete Observations

Authors: Qitong Gao, Dong Wang, Joshua David Amason, Siyang Yuan, Chenyang Tao, Ricardo Henao, Majda Hadziahmetovic, Lawrence Carin, Miroslav Pajic

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test the approach on real-world time-series (i.e., MIMIC-III), tabular data obtained from an eye clinic, and a standard dataset (i.e., MNIST), where our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
Researcher Affiliation Academia Duke University, USA. King Abdullah University of Science and Technology, Saudi Arabia.
Pseudocode Yes Algorithm 1 Gradient Importance Learning (GIL).
Open Source Code Yes Code available at https://github.com/gaoqitong/gradient-importance-learning.
Open Datasets Yes The datasets we use include i) MIMIC-III (Johnson et al., 2016) that consists of real-world EHRs obtained from intensive care units (ICUs), ii) a de-identified ophthalmic patient dataset obtained from an eye center in North America, and iii) hand-written digits MNIST (Le Cun & Cortes). We also tested on a smaller scaled ICU time-series from 2012 Physionet challenge (Silva et al., 2012) and these results can be found in Appendix D.4.
Dataset Splits Yes All patients selected following the above procedure are split into 8:2 to formulate the training and testing datasets... We split all the subjects into a training cohort and a testing cohort following a ratio of 9:1.
Hardware Specification Yes The case studies are run on a work station with three Nvidia Quadro RTX 6000 GPUs with 24GB of memory for each.
Software Dependencies No The paper mentions "We use Tensorflow to implement the models and training algorithms." but does not provide a specific version number for TensorFlow or any other software dependencies.
Experiment Setup Yes To train the imputation-free prediction models using GIL, we perform a grid search for the model learning rate α {0.001, 0.0007, 0.0005, 0.0003, 0.0001, 0.00005, 0.00001}, the exponential decay step for α is selected from {1000, 750, 500} and the exponential decay rate for α is selected from {0.95, 0.9, 0.85, 0.8}. The actor πθ and critic Qν in the GIL (i.e., Alg. 1) are trained using deep deterministic policy gradient (DDPG) Lillicrap et al. (2015) where the discounting factor γ = 0.99. ... Adam optimizer is used to train all the prediction models for baselines... All the models are trained using a batch size of 128.