TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series
Authors: Arjun Ashok, Étienne Marcotte, Valentina Zantedeschi, Nicolas Chapados, Alexandre Drouin
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that our simplified training procedure leads to better training dynamics (e.g., faster convergence to better solutions) as well as state-of-the-art performance on a number of real-world forecasting tasks, while preserving the high flexibility of the TACTi S model (Sec. 5); and 5 EXPERIMENTS We start by empirically validating the two-stage approach to learning attentional copulas (Sec. 5.1). Then, we show that TACTi S-2 achieves state-of-the-art performance in a forecasting benchmark and that it can perform highly accurate interpolation (Sec. 5.2). |
| Researcher Affiliation | Collaboration | Arjun Ashok1 2 3, Etienne Marcotte1, Valentina Zantedeschi1, Nicolas Chapados1 2, Alexandre Drouin1 2 1Service Now Research 2Mila-Quebec AI Institute 3Universit e de Montr eal Montr eal, Canada firstname.lastname@servicenow.com |
| Pseudocode | No | The paper describes the model architecture and training procedure in text and with diagrams, but it does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is made available at https://github.com/Service Now/TACTi S. |
| Open Datasets | Yes | We now evaluate the forecasting and interpolation abilities of TACTi S-2 in a benchmark of five common real-world datasets from the Monash Time Series Forecasting Repository (Godahewa et al., 2021): electricity, fred-md, kdd-cup, solar-10min, and traffic. |
| Dataset Splits | Yes | Validation set During hyperparameter search, we reserve from the end of the training set a number of timesteps equal to 7 times the prediction length. The validation set is then built from all prediction windows that fit in this reserved data. During backtesting, we also remove this amount of data from the training set (except for fred-md where only remove a number equal to the prediction length), but the validation set is built from the 7 (or single for fred-md) non-overlapping prediction window we can get from this reserved data. |
| Hardware Specification | Yes | Compute used All models in the paper are trained in a Docker container with access to a Nvidia Tesla-P100 GPU (12 GB of memory), 2 CPU cores, and 32 GB of RAM. |
| Software Dependencies | No | The paper mentions using 'Optuna (Akiba et al., 2019)' for hyperparameter search and the 'R sandwich package (Zeileis et al., 2020)' for standard error computation. However, it does not provide specific version numbers (e.g., v1.x.y) for these or any other key software components like deep learning frameworks (e.g., PyTorch, TensorFlow) or Python versions. |
| Experiment Setup | Yes | C.2 TRAINING PROCEDURE and C.3 HYPERPARAMETER SEARCH PROTOCOL and C.4 SELECTED HYPERPARAMETERS provide extensive details on the experimental setup. For example, Batch size The batch size was selected as the largest power of 2 between 1 and 256..., We stop the training when any of these conditions are reached: We have reached 72 hours of training for a model without the two-stage curriculum, We have reached 36 hours of training for a single stage of the curriculum, Or we did not observe any improvement in the best value of the NLL on the validation set for 50 epochs. and Tables 7, 8, 9, 10 list specific hyperparameters and their optimal values like 'Learning rate', 'Weight decay', 'Gradient clipping', 'Encoder transformer embedding size', and 'number of heads'. |