A Flexible Generative Model for Heterogeneous Tabular EHR with Missing Modality

Authors: Huan He, William hao, Yuanzhe Xi, Yong Chen, Bradley Malin, Joyce Ho

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that our model consistently outperforms existing state-of-the-art synthetic EHR generation methods both in fidelity by up to 3.10% and utility by up to 7.16%. Additionally, we show that our method can be successfully used in privacy-sensitive settings, where the original patient-level data cannot be shared.
Researcher Affiliation Academia Huan He Department of Biostatistics University of Pennsylvania huan.he@pennmedicine.upenn.edu William Hao Department of Computer Science Emory University william.hao@emory.edu Yuanzhe Xi Department of Mathematics Emory University yuanzhe.xi@emory.edu Yong Chen Department of Biostatistics University of Pennsylvania ychen123@upenn.edu Bradley Malin Department of Biomedical Informatics Vanderbilt University b.malin@vumc.org Joyce C Ho Department of Biostatistics Emory University joyce.c.ho@emory.edu
Pseudocode Yes A.4 ALGORITHM OF FLEXGEN-EHR Algorithm 1: Training of FLEXGEN-EHR
Open Source Code No The paper states that codes for *baseline models* are available online (with links provided), but does not provide an explicit statement or link for the source code of FLEXGEN-EHR itself.
Open Datasets Yes We use two real-world de-identified EHR datasets, MIMIC-III (Johnson et al., 2016) and e ICU (Pollard et al., 2018).
Dataset Splits No The paper does not provide specific percentages or methodology for train/validation/test splits, nor does it explicitly mention a validation set. It mentions using 'test datasets' but not the splitting strategy.
Hardware Specification Yes For training the models, we used Adam (Kingma & Ba, 2015) with the learning rate set to 0.001, and a mini-batch of 128 on a machine equipped with one Nvidia Ge Force RTX 3090 and CUDA 11.2.
Software Dependencies Yes We implemented FLEXGEN-EHR with Py Torch. For training the models, we used Adam (Kingma & Ba, 2015) with the learning rate set to 0.001, and a mini-batch of 128 on a machine equipped with one Nvidia Ge Force RTX 3090 and CUDA 11.2.
Experiment Setup Yes For training the models, we used Adam (Kingma & Ba, 2015) with the learning rate set to 0.001, and a mini-batch of 128... Hyperparamters of FLEXGEN-EHR are selected after grid search. We use a timestep of 50 and a noise scheduling β from 1 10 4 to 1 10 2.