Benchmarking Deep Learning Interpretability in Time Series Predictions
Authors: Aya Abdelsalam Ismail, Mohamed Gunady, Hector Corrada Bravo, Soheil Feizi
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose and report multiple metrics to empirically evaluate the performance of saliency methods for detecting feature importance over time using both precision (i.e., whether identified features contain meaningful signals) and recall (i.e., the number of features with signal identified as important). We design and generate multiple synthetic datasets to capture different temporal-spatial aspects (e.g., Figure 1). |
| Researcher Affiliation | Academia | Aya Abdelsalam Ismail, Mohamed Gunady, Héctor Corrada Bravo , Soheil Feizi {asalam,mgunady,sfeizi}@cs.umd.edu, hcorrada@umiacs.umd.edu Department of Computer Science, University of Maryland |
| Pseudocode | Yes | Algorithm 1: Temporal Saliency Rescaling (TSR) |
| Open Source Code | Yes | Code: https://github.com/ayaabdelsalam91/TS-Interpretability-Benchmark |
| Open Datasets | Yes | We extend the synthetic data proposed by Ismail et al. [23] for binary classification. Along with synthetic datasets, we included MNIST as a multivariate time series as a more general case (treating one of the image axes as time). |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits or describe a validation set setup. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | Captum implementation of different methods was used. (This mentions a library but no specific version number.) |
| Experiment Setup | No | The paper does not provide specific details such as learning rates, batch sizes, or optimizer settings for the experimental setup. |