CONAN: Complementary Pattern Augmentation for Rare Disease Detection

Authors: Limeng Cui, Siddharth Biswal, Lucas M. Glass, Greg Lever, Jimeng Sun, Cao Xiao614-621

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated CONAN on two disease detection tasks. For low prevalence inflammatory bowel disease (IBD) detection, CONAN achieved .96 precision recall area under the curve (PR-AUC) and 50.1% relative improvement over the best baseline. For rare disease idiopathic pulmonary fibrosis (IPF) detection, CONAN achieves .22 PR-AUC with 41.3% relative improvement over the best baseline.
Researcher Affiliation Collaboration 1Analytic Center of Excellence, IQVIA, Cambridge, MA, USA 2College of Information Sciences and Technology, The Pennsylvania State University, PA, USA 3College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
Pseudocode Yes Algorithm 1: CONAN for Rare Disease Detection.
Open Source Code Yes We implement all models with Keras 1.https://github.com/cuilimeng/CONAN
Open Datasets No We leverage data from IQVIA longitudinal prescription (Rx) and medical claims (Dx) databases, which include hundreds of millions patients clinical records.
Dataset Splits No We sample two imbalanced training sets for each dataset, with a ratio of 10% and 1% for positive samples. For the testing set, we extract the data using the actual disease prevalence rate shown in Table 2.
Hardware Specification Yes All methods are trained on an Ubuntu 16.04 with 128GB memory and Nvidia Tesla P100 GPU.
Software Dependencies No We implement all models with Keras 1. The paper mentions Keras but does not provide a specific version number (e.g., Keras 2.x.x).
Experiment Setup Yes We set 128 for dimensions of patient embedding. For the complementary GAN... The training epoch of complementary GAN is 1000. For all models, we use RMSProp (Hinton, Srivastava, and Swersky 2012) with a minibatch of 512 patients, and the training epoch is 30. In order to have a fair comparison, we use focal loss (with γ = 2 and α = 0.25) and set the output dimension as 128 for all models.