Topological Attention for Time Series Forecasting
Authors: Sebastian Zeng, Florian Graf, Christoph Hofer, Roland Kwitt
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach easily integrates into existing end-to-end trainable forecasting models, such as N-BEATS, and, in combination with the latter, exhibits state-of-the-art performance on the large-scale M4 benchmark dataset of 100,000 diverse time series from different domains. Ablation experiments, as well as a comparison to a broad range of forecasting methods in a setting where only a single time series is available for training, corroborate the beneficial nature of including local topological information through an attention mechanism. |
| Researcher Affiliation | Academia | Sebastian Zeng Department of Computer Science University of Salzburg sebastian.zeng@plus.ac.at Florian Graf Department of Computer Science University of Salzburg florian.graf@plus.ac.at Christoph Hofer Department of Computer Science University of Salzburg chofer@cosy.sbg.ac.at Roland Kwitt Department of Computer Science University of Salzburg roland.kwitt@plus.ac.at |
| Pseudocode | No | No pseudocode or algorithm block is explicitly labeled or presented in a structured format. |
| Open Source Code | Yes | Source code is publicly available at https://github.com/plus-rkwitt/TAN. |
| Open Datasets | Yes | To experiment with several single (but long) time series of different characteristics, we use 10 time series from the publicly available electricity [12] demand dataset3 and four (third-party) time series of car part demands, denoted as car-parts. (...) Experiments are based on the publicly available M4 dataset5, consisting of 100,000 time series from six domains... |
| Dataset Splits | Yes | For each time series, 20% of held-out consecutive observations are used for testing, 5% for validation. (...) To this end, we cross-validate T (for all methods) using the s MAPE on the validation set. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper mentions using Ripser [3], ADAM optimizer, Gluon TS [2] library, and N-BEATS [29] (with modifications), but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | In terms of hyperparameters for topological attention, we use a single transformer encoder layer with four attention heads and 32 barcode coordinate functions. We minimize the mean-squared-error via ADAM over 1.5k (electricity) and 2k (car-parts) iterations, respectively, with a batch size of 30. The initial learning rate of the linear map in Eq. (11) is set to 9e-2, the initial learning rates for the components of Eq. (7) are listed in Section 4.3.2, scaled up by a factor of 10. (...) The model uses 20 transformer encoder layers with four attention heads and 64 structure elements for barcode vectorization. For optimization, we use ADAM with initial learning rates of 1e-3 (for N-BEATS and the MLP part of Eq. (7)), 8e-3 (Top Vec) and 5e-3 (Transformer Encoder). All learning rates are annealed according to a cosine learning rate schedule over 5,000 iterations with a batch size of 1,024. |