Feature Importance Explanations for Temporal Black-Box Models
Authors: Akshay Sood, Mark Craven8351-8360
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate TIME by analyzing synthetic data sets and models where the ground truth pertaining to relevant features and their temporal properties is known, and by analyzing a long short term memory (LSTM) model (Hochreiter and Schmidhuber 1997) trained to predict in-hospital mortality from intensive care unit (ICU) data. |
| Researcher Affiliation | Academia | Department of Computer Sciences Department of Biostatistics and Medical Informatics University of Wisconsin-Madison Madison, Wisconsin, U.S.A. sood@cs.wisc.edu, craven@biostat.wisc.edu |
| Pseudocode | No | The paper describes the algorithms and processes used in paragraph form and through mathematical equations, but it does not include any distinct pseudocode blocks or formally labeled algorithm sections. |
| Open Source Code | Yes | Software as well as supplementary material for TIME are available at https://github.com/Craven-Biostat-Lab/anamod. |
| Open Datasets | Yes | We analyze an LSTM trained on MIMIC-III, a publicly available critical care database consisting of records of 58,976 intensive care unit (ICU) admissions (Johnson et al. 2016). |
| Dataset Splits | Yes | The data comprises training, validation and test sets of 14,682, 3,221 and 3,236 stays respectively, with 13.23% of the labels being positive. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions software availability (e.g., at a GitHub link) but does not specify any particular software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | For TIME, we set γ to 0.99 and control FDR at the 0.1 level. We sample |Pj| = 50 permutations to compute importance scores and p-values for each feature j. [...] We set γ as 0.9 and control FDR at the 0.1 level. We sample 200 permutations to compute importance scores and p-values. |