reproducibilityindex.ai

Feature Importance Explanations for Temporal Black-Box Models

Authors: Akshay Sood, Mark Craven8351-8360

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate TIME by analyzing synthetic data sets and models where the ground truth pertaining to relevant features and their temporal properties is known, and by analyzing a long short term memory (LSTM) model (Hochreiter and Schmidhuber 1997) trained to predict in-hospital mortality from intensive care unit (ICU) data.
Researcher Affiliation	Academia	Department of Computer Sciences Department of Biostatistics and Medical Informatics University of Wisconsin-Madison Madison, Wisconsin, U.S.A. sood@cs.wisc.edu, craven@biostat.wisc.edu
Pseudocode	No	The paper describes the algorithms and processes used in paragraph form and through mathematical equations, but it does not include any distinct pseudocode blocks or formally labeled algorithm sections.
Open Source Code	Yes	Software as well as supplementary material for TIME are available at https://github.com/Craven-Biostat-Lab/anamod.
Open Datasets	Yes	We analyze an LSTM trained on MIMIC-III, a publicly available critical care database consisting of records of 58,976 intensive care unit (ICU) admissions (Johnson et al. 2016).
Dataset Splits	Yes	The data comprises training, validation and test sets of 14,682, 3,221 and 3,236 stays respectively, with 13.23% of the labels being positive.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions software availability (e.g., at a GitHub link) but does not specify any particular software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	For TIME, we set γ to 0.99 and control FDR at the 0.1 level. We sample \|Pj\| = 50 permutations to compute importance scores and p-values for each feature j. [...] We set γ as 0.9 and control FDR at the 0.1 level. We sample 200 permutations to compute importance scores and p-values.