Feature Importance Explanations for Temporal Black-Box Models

Authors: Akshay Sood, Mark Craven8351-8360

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate TIME by analyzing synthetic data sets and models where the ground truth pertaining to relevant features and their temporal properties is known, and by analyzing a long short term memory (LSTM) model (Hochreiter and Schmidhuber 1997) trained to predict in-hospital mortality from intensive care unit (ICU) data.
Researcher Affiliation Academia Department of Computer Sciences Department of Biostatistics and Medical Informatics University of Wisconsin-Madison Madison, Wisconsin, U.S.A. sood@cs.wisc.edu, craven@biostat.wisc.edu
Pseudocode No The paper describes the algorithms and processes used in paragraph form and through mathematical equations, but it does not include any distinct pseudocode blocks or formally labeled algorithm sections.
Open Source Code Yes Software as well as supplementary material for TIME are available at https://github.com/Craven-Biostat-Lab/anamod.
Open Datasets Yes We analyze an LSTM trained on MIMIC-III, a publicly available critical care database consisting of records of 58,976 intensive care unit (ICU) admissions (Johnson et al. 2016).
Dataset Splits Yes The data comprises training, validation and test sets of 14,682, 3,221 and 3,236 stays respectively, with 13.23% of the labels being positive.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions software availability (e.g., at a GitHub link) but does not specify any particular software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes For TIME, we set γ to 0.99 and control FDR at the 0.1 level. We sample |Pj| = 50 permutations to compute importance scores and p-values for each feature j. [...] We set γ as 0.9 and control FDR at the 0.1 level. We sample 200 permutations to compute importance scores and p-values.