reproducibilityindex.ai

Benchmarking Deep Learning Interpretability in Time Series Predictions

Authors: Aya Abdelsalam Ismail, Mohamed Gunady, Hector Corrada Bravo, Soheil Feizi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose and report multiple metrics to empirically evaluate the performance of saliency methods for detecting feature importance over time using both precision (i.e., whether identiﬁed features contain meaningful signals) and recall (i.e., the number of features with signal identiﬁed as important). We design and generate multiple synthetic datasets to capture different temporal-spatial aspects (e.g., Figure 1).
Researcher Affiliation	Academia	Aya Abdelsalam Ismail, Mohamed Gunady, Héctor Corrada Bravo , Soheil Feizi {asalam,mgunady,sfeizi}@cs.umd.edu, hcorrada@umiacs.umd.edu Department of Computer Science, University of Maryland
Pseudocode	Yes	Algorithm 1: Temporal Saliency Rescaling (TSR)
Open Source Code	Yes	Code: https://github.com/ayaabdelsalam91/TS-Interpretability-Benchmark
Open Datasets	Yes	We extend the synthetic data proposed by Ismail et al. [23] for binary classiﬁcation. Along with synthetic datasets, we included MNIST as a multivariate time series as a more general case (treating one of the image axes as time).
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits or describe a validation set setup.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	Captum implementation of different methods was used. (This mentions a library but no specific version number.)
Experiment Setup	No	The paper does not provide specific details such as learning rates, batch sizes, or optimizer settings for the experimental setup.