Condensed Memory Networks for Clinical Diagnostic Inferencing
Authors: Aaditya Prakash, Siyuan Zhao, Sadid Hasan, Vivek Datla, Kathy Lee, Ashequl Qadir, Joey Liu, Oladimeji Farri
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the MIMIC-III dataset show that the proposed model outperforms other variants of memory networks to predict the most probable diagnoses given a complex clinical scenario. |
| Researcher Affiliation | Collaboration | Aaditya Prakash Brandeis University, MA aprakash@brandeis.edu Siyuan Zhao Worcester Polytechnic Institute, MA szhao@wpi.edu Sadid A. Hasan, Vivek Datla, Kathy Lee, Ashequl Qadir, Joey Liu, Oladimeji Farri Artificial Intelligence Laboratory, Philips Research North America, Cambridge, MA {firstname.lastname,kathy.lee 1,dimeji.farri}@philips.com |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any link or explicit statement about the availability of open-source code for the described methodology. |
| Open Datasets | Yes | We use the noteevents table from MIMIC-III: v1.3, which contains the unstructured free-text clinical notes for patients. MIMIC-III (Multiparameter Intelligent Monitoring in Intensive Care) (Johnson et al. 2016) is a large freely-available clinical database. |
| Dataset Splits | Yes | Models are trained on 80% of the data and validated on 10%. The remaining 10% is used as test set which is evaluated only once across all experiments with different models. |
| Hardware Specification | No | The paper mentions 'Training time of our model for GPU implementation' but does not specify any particular GPU model or other hardware details (CPU, memory, etc.). |
| Software Dependencies | No | The paper mentions using 'Adam (Kingma and Ba 2014) stochastic gradient descent' for optimization, but it does not specify versions for programming languages, libraries, or other software components. |
| Experiment Setup | Yes | The learning rate is set to 0.001 and batch size for each iteration to 100 for all models. For the final prediction layer, we use a fully connected layer on top of the output from equation 5 with a sigmoid activation function. The loss function is the sum of cross entropy from prediction labels and prediction memory slots using addressing schema. Complexity of the model was penalized by adding L2 regularization to the cross entropy loss function. We use dropout (Srivastava et al. 2014) with probability 0.5 on the output-to-decision sigmoid layer and limit the norm of the gradients to be below 20. |