Ada-MSHyper: Adaptive Multi-Scale Hypergraph Transformer for Time Series Forecasting

Authors: Zongjiang Shang, Ling Chen, Binqing Wu, Dongliang Cui

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on 11 real-world datasets demonstrate that Ada-MSHyper achieves state-of-the-art performance, reducing prediction errors by an average of 4.56%, 10.38%, and 4.97% in MSE for long-range, short-range, and ultra-long-range time series forecasting, respectively.
Researcher Affiliation Academia Zongjiang Shang, Ling Chen , Binqing Wu, Dongliang Cui State Key Laboratory of Blockchain and Data Security College of Computer Science and Technology Zhejiang University {zongjiangshang, lingchen, binqingwu, runnercdl}@cs.zju.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code Yes Code is available at https://github.com/shangzongjiang/Ada-MSHyper.
Open Datasets Yes For long-range time series forecasting, we conduct experiments on 7 commonly used benchmarks, including ETT (ETTh1, ETTh2, ETTm1, and ETTm2), Traffic, Electricity, and Weather datasets following [30, 21, 26]. For short-range time series forecasting, we adopt 4 benchmarks from PEMS (PEMS03, PEMS04, PEMS07, and PEMS08) following [21, 27].
Dataset Splits Yes We split each dataset into training, validation, and test sets based on chronological order. For PEMS (PEMS03, PEMS04, PEMS07, and PEMS08) dataset and ETT (ETTh1, ETTh2, ETTm1, and ETTm2) dataset, the train-validation-test split ratio is 6:2:2. For Weather, Traffic, and Electricity dataset, the train-validation-test split ratio is 7:2:1.
Hardware Specification Yes Ada-MSHyper is trained/tested on a single NVIDIA Geforce RTX 3090 GPU.
Software Dependencies No The paper mentions using PyTorch (in self-reflection) and torch_geometry for hypergraph computation, but it does not specify version numbers for these software components. For instance, 'the optimization of hypergraph computation provided by torch_geometry[2]' is mentioned without a version.
Experiment Setup Yes Adam is set as the optimizer with the initial learning rate of 10-4. It is notable that the above mentioned baseline results cannot be used directly due to different input and output lengths. For a fair comparison, we set the commonly used input length T = 96 and output lengths H {96, 192, 336, 720} for long-range forecasting, H {12, 24, 48} for short-range forecasting, and H {1080, 1440, 1800, 2160} for ultra-long-range forecasting. The max number of scale S is set to 3... The detailed search space of hyperparameters is given in Table 9.