Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

HDT: Hierarchical Discrete Transformer for Multivariate Time Series Forecasting

Authors: Feng Shibo, Peilin Zhao, Liu Liu, Pengcheng Wu, Zhiqi Shen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on five popular MTS datasets verify the effectiveness of our proposed method. The source code will be released. ... We conducted experiments to evaluate the performance and efficiency of HDT, covering short-term and long-term forecasting as well as robustness to missing values. The evaluation includes 5 real-world benchmarks and 12 baselines.
Researcher Affiliation	Collaboration	1College of Computing and Data Science, Nanyang Technological University (NTU), Singapore 2Webank-NTU Joint Research Institute on Fintech, NTU, Singapore 3Tencent AI Lab, Shenzhen, China EMAIL, EMAIL
Pseudocode	Yes	The training and inference details are shown in Algorithm 1, 2 and 3. Figure 1 provides an overview of the model architecture.
Open Source Code	No	The source code will be released.
Open Datasets	Yes	We extensively evaluate the proposed HDT on five real-world benchmarks, covering the mainstream high-dimensional MTS probabilistic forecasting applications, Solar (Lai et al. 2018), Electricity (Lai et al. 2018), Traffic (Salinas et al. 2019), Taxi (Salinas et al. 2019) and Wikipedia (Gasthaus et al. 2019).
Dataset Splits	No	We sample 100 times to report metrics on the test set. All experiments are conducted on a single Nvidia A-100 GPU, and results are based on 3 runs.
Hardware Specification	Yes	All experiments are conducted on a single Nvidia A-100 GPU, and results are based on 3 runs.
Software Dependencies	No	Our method relies on the ADAM optimizer with initial learning rates of 0.0005 and 0.001, and a batch size of 64 across all datasets.
Experiment Setup	Yes	Our method relies on the ADAM optimizer with initial learning rates of 0.0005 and 0.001, and a batch size of 64 across all datasets. The history length is fixed at 96, with prediction lengths of {48, 96, 144}. We sample 100 times to report metrics on the test set. All experiments are conducted on a single Nvidia A-100 GPU, and results are based on 3 runs.