SCAT: A Time Series Forecasting with Spectral Central Alternating Transformers

Authors: Chengjie Zhou, Chao Che, Pengfei Wang, Qiang Zhang

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on ten real-world datasets, encompassing Wind, Electricity, Weather, and others, demonstrate that our Spectral Central Alternating Transformer (SCAT) outperforms state-of-the-art methods (SOTA) by an average of 17.5% in power time series forecasting.
Researcher Affiliation Academia 1School of Computer Science and Technology, Dalian University of Technology 2Key Laboratory of Advanced Design and Intelligent Computing (Dalian University), Ministry of Education, Dalian University
Pseudocode No The paper includes architectural diagrams and mathematical formulas but does not provide a clearly labeled pseudocode block or algorithm.
Open Source Code No The paper does not provide any explicit statement about open-sourcing the code for the methodology or a link to a code repository.
Open Datasets No The paper names several datasets (Windm1-5, ETTm1-2, Weather, Traffic, Electricity) and describes their characteristics in Table 2 and Section 4.1. However, it does not provide specific links, DOIs, repository names, or formal citations with author names and year directly to access these datasets.
Dataset Splits Yes Table 2: The details of datasets are presented in Table 2. Dataset Size respectively denotes the total number of time points in (Train, Validation, and Test) split. E.g., Windm1: (23740, 3391, 6783).
Hardware Specification Yes The experimental framework used Pytorch and the algorithm ran on RTX 3080 GPU.
Software Dependencies No The paper mentions using 'Pytorch' as the experimental framework but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes The SCAT architecture includes two encoder layers and one decoder layer. We set the cluster center dimension to k, fixed the model dimension at 512, and used eight attention components. Additionally, we applied a dropout rate of 0.1. The training batch size was 32. We utilized Adam with an initial learning rate in {10 3, 5 10 4, 10 4, 5 10 5}.