reproducibilityindex.ai

Predicting Medications from Diagnostic Codes with Recurrent Neural Networks

Authors: Jacek M. Bajor, Thomas A. Lasko

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our best model was a GRU that achieved high prediction accuracy (micro-averaged AUC 0.93, Label Ranking Loss 0.076), limited by hardware constraints on model size. Additionally, examining individual cases revealed that many of the predictions marked incorrect were likely to be examples of either omitted medications or omitted billing codes, supporting our assertion of a substantial number of errors and omissions in the data, and the likelihood of models such as these to help correct them.
Researcher Affiliation	Academia	Jacek M. Bajor, Thomas A. Lasko Department of Biomedical Informatics Vanderbilt University School of Medicine Nashville, TN 37203, USA {jacek.m.bajor,tom.lasko}@vanderbilt.edu
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled "Pseudocode" or "Algorithm", nor are there any structured code-like blocks describing procedures.
Open Source Code	No	The paper does not provide any explicit statement about releasing its source code, nor does it include a link to a code repository for the methodology described.
Open Datasets	No	Our source database was the deidentiﬁed mirror of Vanderbilt s Electronic Medical Record, which contains billing codes, medication histories, laboratory test results, narrative text and medical imaging data for over 2 million patients, reaching back nearly 30 years (Roden et al., 2008). This is an internal, de-identified database, not publicly accessible.
Dataset Splits	Yes	This resulted in 610,076 complete patient records, which we divided 80/5/15 into training, validation, and ﬁnal test sets.
Hardware Specification	No	The paper states that model size was "limited by hardware constraints" and "RNN performance was limited by the hardware available," but it does not provide specific details about the hardware used (e.g., CPU, GPU models, memory).
Software Dependencies	No	Both models were implemented using Keras (Chollet, 2015) and Models were implemented using scikit-learn (Pedregosa et al., 2011). While the software is named and cited, specific version numbers for Keras or scikit-learn are not provided.
Experiment Setup	Yes	The optimal hyperparameters for the model were selected in the randomized parameter optimization (Bergstra & Bengio, 2012), with the embedding dimension b = 32, number of layers, and number of nodes optimized by a few trials of human-guided search. Other optimized parameters included the fraction of dropout (between layers, input gates and recurrent connections), and L1 and L2 regularization coefﬁcients (ﬁnal values are presented in Appendix A).