reproducibilityindex.ai

Timer: Generative Pre-trained Transformers Are Large Time Series Models

Authors: Yong Liu, Haoran Zhang, Chenyu Li, Xiangdong Huang, Jianmin Wang, Mingsheng Long

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply Timer on various tasks, which is realized in our unified generative approach. Timer exhibits notable feasibility and generalization in each task, achieving state-of-the-art performance with few samples. By pre-training on increasing available time series data, Timer exhibits zero-shot forecasting capability. Quantitative evaluations and quality assessments are provided among concurrent large time series models.
Researcher Affiliation	Academia	School of Software, BNRist, Tsinghua University.
Pseudocode	No	The paper describes its methods using prose and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code and datasets are available at: https://github.com/thuml/Large-Time-Series Model.
Open Datasets	Yes	During pre-training, we curate large-scale datasets with up to 1 billion time points, unify heterogeneous time series into single-series sequence (S3) format, and develop the GPT-style architecture toward LTSMs. ... Code and datasets are available at: https://github.com/thuml/Large-Time-Series Model. We curate Unified Time Series Dataset (UTSD) as shown in Figure 2. ... We release four volumes on https://huggingface.co/datasets/thuml/UTSD.
Dataset Splits	Yes	Each series representing a variate will be divided into training and validation splits at a ratio of 9:1 for pre-training. ... In order to maintain comparability with previous benchmarks, we keep the same validation and testing sets of original downstream datasets and train the baseline model and Timer with the same set of training samples.
Hardware Specification	Yes	All experiments are implemented in Py Torch (Paszke et al., 2019) and trained using NVIDIA A100 Tensor Core GPU.
Software Dependencies	No	The paper states: "All experiments are implemented in Py Torch (Paszke et al., 2019) and trained using NVIDIA A100 Tensor Core GPU. We use Adam W (Kingma & Ba, 2015) as the optimizer...". While PyTorch and Adam W are mentioned with citations, specific version numbers (e.g., PyTorch 1.x.y) are not explicitly stated.
Experiment Setup	Yes	The base learning rate is 5e-5, and the final learning rate is 2e-6. The decay steps are proportional to the number of training steps of 10 epochs. During pre-training, we use N = 15 as the number of tokens, and the batch size is set to 8192. Configurations for downstream adaptation are listed in Table 8. Table 8: Detailed explanation of model hyperparameters and corresponding parameter quantities. We adopt the learning rate schedule strategy with exponential decay at a base of 0.5 under all three downstream tasks.