Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Information Retention for Time-Series Explanations

Authors: Jinghang Yue, Jing Wang, Lu Zhang, Shuo Zhang, Da Li, Zhaoyang Ma, Youfang Lin

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive quantitative and qualitative experiments on synthetic and real-world datasets demonstrate that the proposed principle significantly improves the accuracy and completeness of explanations compared to baseline methods. The code is available at https://github.com/moon2yue/ORTE public.
Researcher Affiliation Academia 1School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China 2 Key Laboratory of Big Data & Artificial Intelligence in Transportation (Beijing Jiaotong University), Ministry of Education, Beijing, China 3Beijing Key Laboratory of Traffic Data Mining and Embodied Intelligence, Beijing, China. Correspondence to: Jing Wang <EMAIL>.
Pseudocode Yes C. Pseudocode of adapt-STE Algorithm 1 adapt-STE Estimator
Open Source Code Yes The code is available at https://github.com/moon2yue/ORTE public.
Open Datasets Yes We evaluate our method on four carefully designed synthetic datasets with ground-truth annotations: Freq Shapes, Seq Comb-UV, Seq Comb-MV, and Low Var (Queen et al., 2024). ... Additionally, we test on four real-world datasets: ECG ECG arrhythmia detection (Moody & Mark, 2001); PAM human activity recognition (Reiss & Stricker, 2012); Epilepsy EEG seizure detection (Andrzejak et al., 2001); and Boiler mechanical fault detection (Shohet et al., 2019).
Dataset Splits No The paper does not explicitly provide specific training/test/validation dataset splits, percentages, or sample counts. It refers to using pre-trained models and evaluating explanation methods on datasets with or without ground-truth annotations. For example, it mentions "masking the bottom p-percentile of features" and "inserting features from bottom to top" for evaluation but not the splits used for the black-box model training.
Hardware Specification Yes All the experiments are performed on Ubuntu 18.04.6 LTS and 4 GPU NVIDIA Ge Force RTX 2080.
Software Dependencies No The paper mentions "Ubuntu 18.04.6 LTS" but does not specify other software dependencies like programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or specific versions of these components.
Experiment Setup Yes Table 4. Training parameters for transformer-based predictors across all ground-truth and real-world datasets. PARAMETER FREQSHAPE SEQCOMB-UV SEQCOMB-MV LOWVAR ECG PAM EPILEPSY BOILER WAFER FREEZERREGULAR WATER LEARNING RATE 0.001 0.001 0.0005 0.001 0.002 0.001 0.0001 0.001 0.0001 0.0001 0.002 WEIGHT DECAY 0.1 0.01 0.001 0.01 0.001 0.001 0.001 0.001 0.001 0.001 0.001 EPOCHS 100 200 1000 120 500 100 300 500 200 300 500 NUM. LAYERS 1 2 2 1 1 1 1 1 1 1 1 dh 16 64 128 32 64 72 16 32 16 16 64 DROPOUT 0.1 0.25 0.25 0.25 0.1 0.25 0.1 0.25 0.1 0.1 0.1 NORM. EMBEDDING NO NO NO YES YES NO NO YES NO NO YES