CONAN: Complementary Pattern Augmentation for Rare Disease Detection
Authors: Limeng Cui, Siddharth Biswal, Lucas M. Glass, Greg Lever, Jimeng Sun, Cao Xiao614-621
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated CONAN on two disease detection tasks. For low prevalence inflammatory bowel disease (IBD) detection, CONAN achieved .96 precision recall area under the curve (PR-AUC) and 50.1% relative improvement over the best baseline. For rare disease idiopathic pulmonary fibrosis (IPF) detection, CONAN achieves .22 PR-AUC with 41.3% relative improvement over the best baseline. |
| Researcher Affiliation | Collaboration | 1Analytic Center of Excellence, IQVIA, Cambridge, MA, USA 2College of Information Sciences and Technology, The Pennsylvania State University, PA, USA 3College of Computing, Georgia Institute of Technology, Atlanta, GA, USA |
| Pseudocode | Yes | Algorithm 1: CONAN for Rare Disease Detection. |
| Open Source Code | Yes | We implement all models with Keras 1.https://github.com/cuilimeng/CONAN |
| Open Datasets | No | We leverage data from IQVIA longitudinal prescription (Rx) and medical claims (Dx) databases, which include hundreds of millions patients clinical records. |
| Dataset Splits | No | We sample two imbalanced training sets for each dataset, with a ratio of 10% and 1% for positive samples. For the testing set, we extract the data using the actual disease prevalence rate shown in Table 2. |
| Hardware Specification | Yes | All methods are trained on an Ubuntu 16.04 with 128GB memory and Nvidia Tesla P100 GPU. |
| Software Dependencies | No | We implement all models with Keras 1. The paper mentions Keras but does not provide a specific version number (e.g., Keras 2.x.x). |
| Experiment Setup | Yes | We set 128 for dimensions of patient embedding. For the complementary GAN... The training epoch of complementary GAN is 1000. For all models, we use RMSProp (Hinton, Srivastava, and Swersky 2012) with a minibatch of 512 patients, and the training epoch is 30. In order to have a fair comparison, we use focal loss (with γ = 2 and α = 0.25) and set the output dimension as 128 for all models. |