DDN: Dual-domain Dynamic Normalization for Non-stationary Time Series Forecasting
Authors: Tao Dai, Beiliang Wu, Peiyuan Liu, Naiqi Li, Xue Yuerong, Shu-Tao Xia, Zexuan Zhu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on public benchmark datasets under different forecasting models demonstrate the superiority of our DDN over other normalization methods. |
| Researcher Affiliation | Academia | 1College of Computer Science and Software Engineering, Shenzhen University, China 2Tsinghua Shenzhen International Graduate School, Tsinghua University, China |
| Pseudocode | No | The paper describes the proposed method and its components using mathematical equations and textual descriptions, but it does not include a formally structured pseudocode block or algorithm listing. |
| Open Source Code | Yes | Code is available at https://github.com/Hank0626/DDN. |
| Open Datasets | Yes | Extensive experiments on public benchmark datasets... We provide access to all datasets through https://github.com/thuml/i Transformer. |
| Dataset Splits | Yes | We partition all datasets chronologically into training, validation, and testing subsets. Specifically, for the ETT dataset, we adopted a 6:2:2 split ratio, while a 7:1:2 ratio was utilized for the other datasets. |
| Hardware Specification | Yes | All experiments were conducted using Py Torch on a single NVIDIA 3090 24GB GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch' and 'ADAM optimizer [39]' but does not provide specific version numbers for these software components, which is required for reproducibility. |
| Experiment Setup | Yes | The Mean Square Error (MSE) and Mean Absolute Error (MAE) are chosen as evaluation metrics, with MSE serving as the training loss. All models use the same prediction lengths T = {96, 192, 336, 720}. For the look-back window L, Autoformer [20] and FEDformer [19] use L = 96, while DLinear [27] and i Transformer [24] utilize L = 336 and L = 720 respectively. The wavelet bases initialize to the coiflet bases, the default size of our sliding window is set to 7 for information content and temporal locality balance, and α starts at zero. We utilize the ADAM optimizer [39] with an initial learning rate of 1e 4 for the distribution prediction model and employing Mean Squared Error (MSE) loss. The batch size, training epochs, and other baseline settings remain consistent with i Transformer [24]. The network for mean or standard deviation prediction comprises two feedforward Neural Network (FFN) layers, with dimensions of 512 for the first layer and 1024 for the second layer. We initialize the wavelet as Coiflet3, with α starting from 0. We conduct pre-training for 5 epochs and commence collaborative training from either the first or second epoch based on specific settings and datasets, aiming for improved training and model fitting. |