reproducibilityindex.ai

Explaining Time Series Predictions with Dynamic Masks

Authors: Jonathan Crabbé, Mihaela Van Der Schaar

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the quality of our dynamic masks. There are two big difficulties to keep in mind when evaluating the quality of a saliency method. The first one is that the true importance of feature is usually unknown with real-world data. The second one is that the the performance of a saliency method in identifying relevant features depends on the black-box and its performances. In order to illustrate these challenges, we propose three different experiments in ascending order of difficulty. In the first experiment, a white box with known feature importance is used so that both difficulties are avoided. In the second experiment, a blackbox trained on a dataset with known feature importance is used and hence only the second difficulty is encountered. In the third experiment, a black-box trained on a real-world clinical dataset is used and hence both difficulties are encountered. For each experiment, more details are given in Section 3 of the supplementary materials.
Researcher Affiliation	Academia	1DAMTP, University of Cambridge, UK 2University of California Los Angeles, USA 3The Alan Turing Institute, UK.
Pseudocode	No	The paper describes mathematical formulations and processes but does not present any structured pseudocode or algorithm blocks in its main text.
Open Source Code	Yes	Our implementation can be found at https://github. com/Jonathan Crabb e/Dynamask.
Open Datasets	Yes	We use the MIMIC-III dataset (Johnson et al., 2016), that contains the health record of 40,000 ICU deidentified patients at the Beth Israel Deaconess Medical Center. The selected data and its preprocessing is the same as the one done by (Tonekaboni et al., 2020).
Dataset Splits	No	The paper mentions training data for synthetic experiments ("800 of them are used to train") and for the MIMIC-III dataset, but it does not specify a separate validation dataset split with clear proportions or counts for either experiment.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory, or cluster specifications) used to run the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation or experiments (e.g., Python version, PyTorch/TensorFlow versions).
Experiment Setup	Yes	We generate 1000 such time series, 800 of them are used to train a RNN black-box classiﬁer f with one hidden layer made of 200 hidden GRU cells. ...We then ﬁt an extremal mask to the black-box by minimizing the cross entropy error for each test time series. ...We reproduce the MIMIC mortality experiment from (Tonekaboni et al., 2020). We ﬁt a RNN black-box with 200 hidden GRU cells to predict the mortality of a patient based on 48 hours (T = 48) of patient features (d X = 31).