Unified Training of Universal Time Series Forecasting Transformers

Authors: Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, Doyen Sahoo

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains, MOIRAI achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models. Code, data, and model weights can be found at https://github. com/Salesforce AIResearch/uni2ts. and We perform experimental evaluations on both in and out-of-distribution settings, and show that MOIRAI consistently achieves competitive or superior performance compared to state-of-the-art full-shot baselines.
Researcher Affiliation Collaboration 1Salesforce AI Research 2School of Computing and Information Systems, Singapore Management University.
Pseudocode No The paper describes the architecture and training procedures in narrative text and diagrams, but does not include formal pseudocode or algorithm blocks.
Open Source Code Yes Code, data, and model weights can be found at https://github. com/Salesforce AIResearch/uni2ts.
Open Datasets Yes To power the training of our Large Time Series Model (LTM), we introduce the Large-scale Open Time Series Archive (LOTSA), the largest collection of open time series datasets with 27B observations across nine domains. (...) LOTSA, the model weights, and our library for unified training of universal time series models, UNI2TS, will be fully open sourced.
Dataset Splits Yes We take the validation set to be the last forecast horizon before the test set, and the train set to be everything before that. and For MOIRAI, we perform inference time tuning, selecting context length from {1000, 2000, 3000, 4000, 5000} and patch sizes based on frequency, on the validation CRPS.
Hardware Specification Yes Models are trained on NVIDIA A100-40G GPUs with TF32 precision.
Software Dependencies No The paper mentions the use of 'Adam W optimizer' and states that baselines were 'all implemented with the Gluon TS library (Alexandrov et al., 2020)', and that a 'unified storage format using Arrow (Richardson et al., 2023)' was designed. However, it does not provide specific version numbers for these software components, which are crucial for reproducibility.
Experiment Setup Yes The small model is trained for 100, 000 steps, while base and large models are trained for 1, 000, 000 steps with a batch size of 256. For optimization, we use the Adam W optimizer with the following hyperparameters, lr = 1e-3, weight decay = 1e-1, β1 = 0.9, β2 = 0.98. We also apply a learning rate scheduler with linear warmup for the first 10, 000 steps, and cosine annealing thereafter.