Timer: Generative Pre-trained Transformers Are Large Time Series Models

Authors: Yong Liu, Haoran Zhang, Chenyu Li, Xiangdong Huang, Jianmin Wang, Mingsheng Long

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply Timer on various tasks, which is realized in our unified generative approach. Timer exhibits notable feasibility and generalization in each task, achieving state-of-the-art performance with few samples. By pre-training on increasing available time series data, Timer exhibits zero-shot forecasting capability. Quantitative evaluations and quality assessments are provided among concurrent large time series models.
Researcher Affiliation Academia School of Software, BNRist, Tsinghua University.
Pseudocode No The paper describes its methods using prose and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code and datasets are available at: https://github.com/thuml/Large-Time-Series Model.
Open Datasets Yes During pre-training, we curate large-scale datasets with up to 1 billion time points, unify heterogeneous time series into single-series sequence (S3) format, and develop the GPT-style architecture toward LTSMs. ... Code and datasets are available at: https://github.com/thuml/Large-Time-Series Model. We curate Unified Time Series Dataset (UTSD) as shown in Figure 2. ... We release four volumes on https://huggingface.co/datasets/thuml/UTSD.
Dataset Splits Yes Each series representing a variate will be divided into training and validation splits at a ratio of 9:1 for pre-training. ... In order to maintain comparability with previous benchmarks, we keep the same validation and testing sets of original downstream datasets and train the baseline model and Timer with the same set of training samples.
Hardware Specification Yes All experiments are implemented in Py Torch (Paszke et al., 2019) and trained using NVIDIA A100 Tensor Core GPU.
Software Dependencies No The paper states: "All experiments are implemented in Py Torch (Paszke et al., 2019) and trained using NVIDIA A100 Tensor Core GPU. We use Adam W (Kingma & Ba, 2015) as the optimizer...". While PyTorch and Adam W are mentioned with citations, specific version numbers (e.g., PyTorch 1.x.y) are not explicitly stated.
Experiment Setup Yes The base learning rate is 5e-5, and the final learning rate is 2e-6. The decay steps are proportional to the number of training steps of 10 epochs. During pre-training, we use N = 15 as the number of tokens, and the batch size is set to 8192. Configurations for downstream adaptation are listed in Table 8. Table 8: Detailed explanation of model hyperparameters and corresponding parameter quantities. We adopt the learning rate schedule strategy with exponential decay at a base of 0.5 under all three downstream tasks.