Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting
Authors: Yong Liu, Haixu Wu, Jianmin Wang, Mingsheng Long
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to evaluate the performance of Non-stationary Transformers on six real-world time series forecasting benchmarks and further validate the generality of the proposed framework on various mainstream Transformer variants. |
| Researcher Affiliation | Academia | Yong Liu , Haixu Wu , Jianmin Wang, Mingsheng Long B School of Software, BNRist, Tsinghua University, China {liuyong21,whx20}@mails.tsinghua.edu.cn, {jimwang,mingsheng}@tsinghua.edu.cn |
| Pseudocode | No | The paper describes its methodology using prose and mathematical equations, but it does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at this repository: https://github.com/thuml/Nonstationary_Transformers. |
| Open Datasets | Yes | Datasets Here are the descriptions of the datasets: (1) Electricity [3]... (2) ETT [37]... (3) Exchange [18]... (4) ILI [1]... (5) Traffic [2]... (6) Weather [4]... |
| Dataset Splits | Yes | We follow the standard protocol that divides each dataset into the training, validation, and testing subsets according to the chronological order. The split ratio is 6:2:2 for the ETT dataset and 7:1:2 for others. |
| Hardware Specification | Yes | All the experiments are implemented with PyTorch [28] and conducted on a single NVIDIA TITAN V 12GB GPU. |
| Software Dependencies | No | The paper mentions 'PyTorch [28]' as the implementation framework but does not specify its version number or any other software dependencies with version details. |
| Experiment Setup | Yes | Each model is trained by ADAM [16] using L2 loss with the initial learning rate of 10 4 and batch size of 32. Each Transformer-based model contains two encoder layers and one decoder layer. Considering the efficiency of hyperparameters search, we use two-layer perceptron projector with the hidden dimension varying in {64, 128, 256} in De-stationary Attention. |