Condensed Memory Networks for Clinical Diagnostic Inferencing

Authors: Aaditya Prakash, Siyuan Zhao, Sadid Hasan, Vivek Datla, Kathy Lee, Ashequl Qadir, Joey Liu, Oladimeji Farri

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the MIMIC-III dataset show that the proposed model outperforms other variants of memory networks to predict the most probable diagnoses given a complex clinical scenario.
Researcher Affiliation Collaboration Aaditya Prakash Brandeis University, MA aprakash@brandeis.edu Siyuan Zhao Worcester Polytechnic Institute, MA szhao@wpi.edu Sadid A. Hasan, Vivek Datla, Kathy Lee, Ashequl Qadir, Joey Liu, Oladimeji Farri Artificial Intelligence Laboratory, Philips Research North America, Cambridge, MA {firstname.lastname,kathy.lee 1,dimeji.farri}@philips.com
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any link or explicit statement about the availability of open-source code for the described methodology.
Open Datasets Yes We use the noteevents table from MIMIC-III: v1.3, which contains the unstructured free-text clinical notes for patients. MIMIC-III (Multiparameter Intelligent Monitoring in Intensive Care) (Johnson et al. 2016) is a large freely-available clinical database.
Dataset Splits Yes Models are trained on 80% of the data and validated on 10%. The remaining 10% is used as test set which is evaluated only once across all experiments with different models.
Hardware Specification No The paper mentions 'Training time of our model for GPU implementation' but does not specify any particular GPU model or other hardware details (CPU, memory, etc.).
Software Dependencies No The paper mentions using 'Adam (Kingma and Ba 2014) stochastic gradient descent' for optimization, but it does not specify versions for programming languages, libraries, or other software components.
Experiment Setup Yes The learning rate is set to 0.001 and batch size for each iteration to 100 for all models. For the final prediction layer, we use a fully connected layer on top of the output from equation 5 with a sigmoid activation function. The loss function is the sum of cross entropy from prediction labels and prediction memory slots using addressing schema. Complexity of the model was penalized by adding L2 regularization to the cross entropy loss function. We use dropout (Srivastava et al. 2014) with probability 0.5 on the output-to-decision sigmoid layer and limit the norm of the gradients to be below 20.