Adversarial Sparse Transformer for Time Series Forecasting
Authors: Sifan Wu, Xi Xiao, Qianggang Ding, Peilin Zhao, Ying Wei, Junzhou Huang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on several real-world datasets show the effectiveness and efficiency of our method. [...] In this work, we present Adversarial Sparse Transformer(AST), a novel Transformer-based model for time series forecasting. By adversarial learning, we improve the contiguous and fidelity at the sequence level. We further propose Sparse Transformer to improve the ability to pay more attention on relevant steps in time series. Extensive experiments on a series of real-world time series datasets have demonstrated the effectiveness of AST for both short-term and long-term time series forecasting. |
| Researcher Affiliation | Collaboration | Sifan Wu Tsinghua University wusf18@mails.tsinghua.edu.cn Xi Xiao Tsinghua University/Peng Cheng Laboratory xiaox@sz.tsinghua.edu.cn Qianggang Ding Tsinghua University dqg18@mails.tsinghua.edu.cn Peilin Zhao Tencent AI Lab masonzhao@tencent.com Ying Wei Tencent AI Lab judywei@tencent.com Junzhou Huang University of Texas at Arlington/Tencent AI Lab jzhuang@uta.edu |
| Pseudocode | Yes | Algorithm 1 Adversarial Training for Time Series Forecasting |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We use five public datasets: electricity5, traffic6, wind7, solar8, M4-Hourly [17] for our evaluation. The electricity dataset is an hourly time series of electricity consumption of 370 customers. [...] 5https://archive.ics.uci.edu/ml/datasets/Electricity_Load_Diagrams_20112014 6https://archive.ics.uci.edu/ml/datasets/PEMS-SF 7https://www.kaggle.com/sohier/30-years-of-european-wind-generation 8https://www.nrel.gov/grid/solar-power-data.html |
| Dataset Splits | No | Following [25], we generate multiple training windows by varying the start point from the original time series with fixed history length t0 and forecasting horizon τ. For the short-term forecasting, we evaluate rolling-day forecasts for seven days after training and the length of conditioning range is set to one week of time series (168 observations per series). For the long-term forecasting, we directly forecast for 7 days and the length of conditioning range is set to two weeks of time series (336 observations per series). The paper describes how data is used for training and prediction, but does not define a specific validation set split with percentages or counts. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Adam [10] for optimization but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the implementation. |
| Experiment Setup | Yes | Therefore in our paper, we set the parameter α = 1.5 (we further compare the performance of different α in the experiments). [...] The discriminator network D and the generator network G are trained jointly with Adam [10]. [...] where λ is the trade-off hyper-parameter that balances Ladv and Lρ. [...] For the short-term forecasting, we evaluate rolling-day forecasts for seven days after training and the length of conditioning range is set to one week of time series (168 observations per series). For the long-term forecasting, we directly forecast for 7 days and the length of conditioning range is set to two weeks of time series (336 observations per series). |