reproducibilityindex.ai

Are Self-Attentions Effective for Time Series Forecasting?

Authors: Dongbin Kim, Jinseong Park, Jaewook Lee, Hoki Kim

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiment across various datasets demonstrates that our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models. The implementation of our model is available at: https://github.com/dongbeank/CATS.
Researcher Affiliation	Academia	1Seoul National University 2Chung-Ang University {dongbin413,jinseong,jaewook}@snu.ac.kr, hokikim@cau.ac.kr
Pseudocode	No	The paper describes the model architecture and components in detail through text and diagrams (Figure 4) but does not include a formal pseudocode block or algorithm.
Open Source Code	Yes	The implementation of our model is available at: https://github.com/dongbeank/CATS.
Open Datasets	Yes	To this end, we use 7 different real-world datasets and 9 baseline models. For datasets, we use Electricity, ETT (ETTh1, ETTh2, ETTm1, and ETTm2), Weather, Traffic, and M4. These datasets are provided in [23] and [24] for time series forecasting benchmark, detailed in Appendix.
Dataset Splits	Yes	For the forecasting horizon T, we also used the widely accepted values, i.e., [96, 192, 336, 720]. For our model, in all configurations, we adopt the Ge GLU activation function [16] between the two linear layers in the feed-forward network for our model. Additionally, we use learnable positional embedding parameters for the input data and omit positional embeddings for learnable queries to avoid redundant parameter learning. For the experiments summarized in Table 4 and Table 11, our model uses three cross-attention layers with embedding size D = 256, number of attention heads H = 32. Specifically, to avoid overfitting on small datasets [14], we use patch length 48 on the ETTh1 and ETTh2 datasets. Further details on the hyperparameter settings for these experiments are provided in Table 9.
Hardware Specification	Yes	We used 4 NVIDIA RTX 4090 24GB GPUs with 2 Intel(R) Xeon(R) Gold 5218R CPUs @ 2.10GHz for all experiments.
Software Dependencies	No	The paper mentions using a 'Ge GLU activation function' but does not specify version numbers for general software dependencies like Python, PyTorch, TensorFlow, or CUDA libraries.
Experiment Setup	Yes	For the forecasting horizon T, we also used the widely accepted values, i.e., [96, 192, 336, 720]. For our model, in all configurations, we adopt the Ge GLU activation function [16] between the two linear layers in the feed-forward network for our model. Additionally, we use learnable positional embedding parameters for the input data and omit positional embeddings for learnable queries to avoid redundant parameter learning. ... Further details on the hyperparameter settings for these experiments are provided in Table 9. (Table 9 specifies Layers, Embedding Size, Query Sharing, Input Sequence Length, Batch Size, Epoch, Learning Rate).