reproducibilityindex.ai

Introducing Spectral Attention for Long-Range Dependency in Time Series Forecasting

Authors: Bong Gyun Kang, Dongjun Lee, HyunGi Kim, Dohyun Chung, Sungroh Yoon

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on 11 real-world time series datasets using 7 recent forecasting models, we consistently demonstrate the efficacy of our Spectral Attention mechanism, achieving state-of-the-art results.
Researcher Affiliation	Academia	Bong Gyun Kang1 Dongjun Lee1 Hyun Gi Kim2 Do Hyun Chung3 Sungroh Yoon1,2 1 Interdisciplinary Program in Artificial Intelligence, Seoul National University 2 Department of Electrical and Computer Engineering, Seoul National University 3 Department of Future Automotive Mobility, Seoul National University
Pseudocode	Yes	Algorithm 1 Batched Spectral Attention (1 epoch)
Open Source Code	Yes	The full code is available at https://github.com/DJLee1208/BSA_2024.
Open Datasets	Yes	We use eleven real-world public datasets: Weather, Traffic, ECL, ETT (4 sub-datasets; h1, h2, m1, m2), Exchange, PEMS03, Energy Data, and Illness [6, 26, 29, 49]. All these public datasets were downloaded from the referenced sources in March 2024.
Dataset Splits	Yes	Train, validation, and test split ratio are 0.6, 0.2, 0.2 for the ETT dataset and 0.7, 0.1, 0.2 for the Weather, Traffic, ECL, Exchange, PEMS03, Energy Data, and Illness datasets. and Model selection Hyperparameter search is conducted based on the validation set.
Hardware Specification	Yes	Each experiment was conducted on a single NVIDIA Ge Force RTX 3090Ti or NVIDIA A40 or NVIDIA L40 GPU.
Software Dependencies	No	The whole code is implemented in Py Torch [38].
Experiment Setup	Yes	We first train the base model for more than 30 epochs (20 epochs for the Traffic dataset) using Adam [22] to ensure that the validation MSE saturates, while also conducting an extensive hyperparameter search. The hyperparameter search space for the base model is as follows: the possible learning rate is (0.03, 0.01, 0.003, 0.001, 0.0003), and the weight decay is (0.01, 0.003, 0.001, 0.0003, 0.0001, 0.00003). The hyperparameter search space for BSA finetuning is as follows: the possible learning rate for the SA-Matrix in the BSA module is (0.08, 0.05, 0.03, 0.01, 0.003, 0.001), learning rate for the rest of the model, i.e. original modules, is (0.01, 0.003, 0.001, 0.0003, 0.0001, 0.00003, 0.00001), learning rate for smoothing factor αk is (none, 0.03, 0.01, 0.003, 0.001, 0.0001, 0.00001), initialization for smoothing factor αk is ([0.9, 0.99, 0.999], [0.9, 0.99, 0.999, 0.999], [0.9, 0.95, 0.992, 0.999], [0.8, 0.96, 0.992, 0.9984, 0.99968]). The default batch size for baseline model saturation is 64, while for our method which involves fine-tuning after integrating the BSA module it is 256. We used the ADAM [22] optimizer and L2 loss (MSE loss) for the model optimization.