Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Shedding Light on Time Series Classification using Interpretability Gated Networks
Authors: Yunshi Wen, Tengfei Ma, Ronny Luss, Debarun Bhattacharjya, Achille Fokoue, Anak Agung Julius
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed model achieves comparable performance with state-of-the-art deep learning models while additionally providing interpretable classifiers for various benchmark datasets. We further demonstrate that Interp GN outperforms state-of-the-art methods on the UEA multivariate TS classification archive (Bagnall et al., 2018), illustrate the interpretability of Interp GN on multivariate TS classification datasets, and finally apply our framework on a real-world healthcare dataset, MIMIC-III (Johnson et al., 2016). |
| Researcher Affiliation | Collaboration | 1Rensselaer Polytechnic Institute, 2Stony Brook University, 3IBM Research EMAIL, EMAIL EMAIL EMAIL |
| Pseudocode | No | The paper describes the methodology using mathematical formulations (e.g., Equation 1, 2) and provides code snippets for implementation details in the appendix, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block in the main text. |
| Open Source Code | Yes | Code is available at: https://github.com/Yunshi Wen/Interpret Gated Network. |
| Open Datasets | Yes | We further demonstrate that Interp GN outperforms state-of-the-art methods on the UEA multivariate TS classification archive (Bagnall et al., 2018), illustrate the interpretability of Interp GN on multivariate TS classification datasets, and finally apply our framework on a real-world healthcare dataset, MIMIC-III (Johnson et al., 2016). |
| Dataset Splits | Yes | For each dataset, we train the models on the default training split with 5 different random seeds and report the average accuracy on the test split. ... For predicting in-hospital mortality, the data is highly imbalance, with more than 80% positive samples (patient survived). Therefore we randomly select a subset of 1500 positive and 1500 negative samples to evaluate our models. A sample in the dataset is a TS with M = 9 and T = 48. We further divide the subset into 80% training and 20% validation to evaluate the performance using 5-fold cross validation. |
| Hardware Specification | Yes | All the experiments run on a NVIDIA V100-SXM2-32GB GPUs. |
| Software Dependencies | Yes | All the experiments are implemented using Python 3.11 and Py Torch 2.4.0. |
| Experiment Setup | Yes | The default hyperparameter to produce results in Table 1 and Table 5 are summarized in Table3. For each dataset, we train the models on the default training split with 5 different random seeds and report the average accuracy on the test split. |