reproducibilityindex.ai

Understanding Learned Models by Identifying Important Features at the Right Resolution

Authors: Kyubin Lee, Akshay Sood, Mark Craven4155-4163

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach by analyzing random forest and LSTM neural network models learned in two challenging biomedical applications.
Researcher Affiliation	Academia	Kyubin Lee Clinical Genomics Analysis Branch National Cancer Center Republic of Korea Akshay Sood Dept. of Computer Sciences Dept. of Biostatistics & Medical Informatics University of Wisconsin-Madison Mark Craven Dept. of Biostatistics & Medical Informatics Dept. of Computer Sciences University of Wisconsin-Madison
Pseudocode	Yes	Algorithm 1: General approach to identifying important features via perturbation
Open Source Code	Yes	The source code for our methods is available at https://github.com/Craven-Biostat-Lab/mihifepe.
Open Datasets	No	The paper describes datasets used (HSV-1 and EHR data from University of Wisconsin Health System) but does not provide concrete access information (link, DOI, or specific citation for public access) for them.
Dataset Splits	Yes	Using 10-fold cross-validation to assess the predictive accuracy of the networks results in an area under the ROC curve (AUROC) of 0.757.
Hardware Specification	No	No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were provided.
Software Dependencies	No	The paper mentions software like Med2Vec and various models (Random Forest, LSTM), but does not provide specific version numbers for any of the software dependencies used.
Experiment Setup	Yes	Our LSTM networks have a cell state of size 100 and a sigmoid output layer. The coded diagnoses, problem diagnoses, and interventions (procedures and medications) all comprise large vocabularies (6,533 for coded diagnoses, 4,398 for problem diagnoses, and 8,745 for interventions) of which only a small subset is recorded at each encounter. Therefore, we ﬁrst map event vectors for each of these sets to an embedded space using Med2Vec (Choi et al. 2016), resulting in shorter, dense ﬁxed-length vectors. Separate embeddings of size 200 were generated for each of these sets, which were then concatenated, along with the other temporal features, to produce the event representation at each timestamp in the record.