Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting

Authors: Yong Liu, Haixu Wu, Jianmin Wang, Mingsheng Long

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to evaluate the performance of Non-stationary Transformers on six real-world time series forecasting benchmarks and further validate the generality of the proposed framework on various mainstream Transformer variants.
Researcher Affiliation Academia Yong Liu , Haixu Wu , Jianmin Wang, Mingsheng Long B School of Software, BNRist, Tsinghua University, China {liuyong21,whx20}@mails.tsinghua.edu.cn, {jimwang,mingsheng}@tsinghua.edu.cn
Pseudocode No The paper describes its methodology using prose and mathematical equations, but it does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at this repository: https://github.com/thuml/Nonstationary_Transformers.
Open Datasets Yes Datasets Here are the descriptions of the datasets: (1) Electricity [3]... (2) ETT [37]... (3) Exchange [18]... (4) ILI [1]... (5) Traffic [2]... (6) Weather [4]...
Dataset Splits Yes We follow the standard protocol that divides each dataset into the training, validation, and testing subsets according to the chronological order. The split ratio is 6:2:2 for the ETT dataset and 7:1:2 for others.
Hardware Specification Yes All the experiments are implemented with PyTorch [28] and conducted on a single NVIDIA TITAN V 12GB GPU.
Software Dependencies No The paper mentions 'PyTorch [28]' as the implementation framework but does not specify its version number or any other software dependencies with version details.
Experiment Setup Yes Each model is trained by ADAM [16] using L2 loss with the initial learning rate of 10 4 and batch size of 32. Each Transformer-based model contains two encoder layers and one decoder layer. Considering the efficiency of hyperparameters search, we use two-layer perceptron projector with the hidden dimension varying in {64, 128, 256} in De-stationary Attention.