Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Feature Importance Explanations for Temporal Black-Box Models
Authors: Akshay Sood, Mark Craven8351-8360
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate TIME by analyzing synthetic data sets and models where the ground truth pertaining to relevant features and their temporal properties is known, and by analyzing a long short term memory (LSTM) model (Hochreiter and Schmidhuber 1997) trained to predict in-hospital mortality from intensive care unit (ICU) data. |
| Researcher Affiliation | Academia | Department of Computer Sciences Department of Biostatistics and Medical Informatics University of Wisconsin-Madison Madison, Wisconsin, U.S.A. EMAIL, EMAIL |
| Pseudocode | No | The paper describes the algorithms and processes used in paragraph form and through mathematical equations, but it does not include any distinct pseudocode blocks or formally labeled algorithm sections. |
| Open Source Code | Yes | Software as well as supplementary material for TIME are available at https://github.com/Craven-Biostat-Lab/anamod. |
| Open Datasets | Yes | We analyze an LSTM trained on MIMIC-III, a publicly available critical care database consisting of records of 58,976 intensive care unit (ICU) admissions (Johnson et al. 2016). |
| Dataset Splits | Yes | The data comprises training, validation and test sets of 14,682, 3,221 and 3,236 stays respectively, with 13.23% of the labels being positive. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions software availability (e.g., at a GitHub link) but does not specify any particular software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | For TIME, we set γ to 0.99 and control FDR at the 0.1 level. We sample |Pj| = 50 permutations to compute importance scores and p-values for each feature j. [...] We set γ as 0.9 and control FDR at the 0.1 level. We sample 200 permutations to compute importance scores and p-values. |