Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Optimal Information Retention for Time-Series Explanations

Authors: Jinghang Yue, Jing Wang, Lu Zhang, Shuo Zhang, Da Li, Zhaoyang Ma, Youfang Lin

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive quantitative and qualitative experiments on synthetic and real-world datasets demonstrate that the proposed principle significantly improves the accuracy and completeness of explanations compared to baseline methods. The code is available at https://github.com/moon2yue/ORTE public.
Researcher Affiliation	Academia	1School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China 2 Key Laboratory of Big Data & Artificial Intelligence in Transportation (Beijing Jiaotong University), Ministry of Education, Beijing, China 3Beijing Key Laboratory of Traffic Data Mining and Embodied Intelligence, Beijing, China. Correspondence to: Jing Wang <EMAIL>.
Pseudocode	Yes	C. Pseudocode of adapt-STE Algorithm 1 adapt-STE Estimator
Open Source Code	Yes	The code is available at https://github.com/moon2yue/ORTE public.
Open Datasets	Yes	We evaluate our method on four carefully designed synthetic datasets with ground-truth annotations: Freq Shapes, Seq Comb-UV, Seq Comb-MV, and Low Var (Queen et al., 2024). ... Additionally, we test on four real-world datasets: ECG ECG arrhythmia detection (Moody & Mark, 2001); PAM human activity recognition (Reiss & Stricker, 2012); Epilepsy EEG seizure detection (Andrzejak et al., 2001); and Boiler mechanical fault detection (Shohet et al., 2019).
Dataset Splits	No	The paper does not explicitly provide specific training/test/validation dataset splits, percentages, or sample counts. It refers to using pre-trained models and evaluating explanation methods on datasets with or without ground-truth annotations. For example, it mentions "masking the bottom p-percentile of features" and "inserting features from bottom to top" for evaluation but not the splits used for the black-box model training.
Hardware Specification	Yes	All the experiments are performed on Ubuntu 18.04.6 LTS and 4 GPU NVIDIA Ge Force RTX 2080.
Software Dependencies	No	The paper mentions "Ubuntu 18.04.6 LTS" but does not specify other software dependencies like programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or specific versions of these components.
Experiment Setup	Yes	Table 4. Training parameters for transformer-based predictors across all ground-truth and real-world datasets. PARAMETER FREQSHAPE SEQCOMB-UV SEQCOMB-MV LOWVAR ECG PAM EPILEPSY BOILER WAFER FREEZERREGULAR WATER LEARNING RATE 0.001 0.001 0.0005 0.001 0.002 0.001 0.0001 0.001 0.0001 0.0001 0.002 WEIGHT DECAY 0.1 0.01 0.001 0.01 0.001 0.001 0.001 0.001 0.001 0.001 0.001 EPOCHS 100 200 1000 120 500 100 300 500 200 300 500 NUM. LAYERS 1 2 2 1 1 1 1 1 1 1 1 dh 16 64 128 32 64 72 16 32 16 16 64 DROPOUT 0.1 0.25 0.25 0.25 0.1 0.25 0.1 0.25 0.1 0.1 0.1 NORM. EMBEDDING NO NO NO YES YES NO NO YES NO NO YES