reproducibilityindex.ai

Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting

Authors: Yong Liu, Haixu Wu, Jianmin Wang, Mingsheng Long

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments to evaluate the performance of Non-stationary Transformers on six real-world time series forecasting benchmarks and further validate the generality of the proposed framework on various mainstream Transformer variants.
Researcher Affiliation	Academia	Yong Liu , Haixu Wu , Jianmin Wang, Mingsheng Long B School of Software, BNRist, Tsinghua University, China {liuyong21,whx20}@mails.tsinghua.edu.cn, {jimwang,mingsheng}@tsinghua.edu.cn
Pseudocode	No	The paper describes its methodology using prose and mathematical equations, but it does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at this repository: https://github.com/thuml/Nonstationary_Transformers.
Open Datasets	Yes	Datasets Here are the descriptions of the datasets: (1) Electricity [3]... (2) ETT [37]... (3) Exchange [18]... (4) ILI [1]... (5) Traffic [2]... (6) Weather [4]...
Dataset Splits	Yes	We follow the standard protocol that divides each dataset into the training, validation, and testing subsets according to the chronological order. The split ratio is 6:2:2 for the ETT dataset and 7:1:2 for others.
Hardware Specification	Yes	All the experiments are implemented with PyTorch [28] and conducted on a single NVIDIA TITAN V 12GB GPU.
Software Dependencies	No	The paper mentions 'PyTorch [28]' as the implementation framework but does not specify its version number or any other software dependencies with version details.
Experiment Setup	Yes	Each model is trained by ADAM [16] using L2 loss with the initial learning rate of 10 4 and batch size of 32. Each Transformer-based model contains two encoder layers and one decoder layer. Considering the efficiency of hyperparameters search, we use two-layer perceptron projector with the hidden dimension varying in {64, 128, 256} in De-stationary Attention.