A decoder-only foundation model for time-series forecasting
Authors: Abhimanyu Das, Weihao Kong, Rajat Sen, Yichen Zhou
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on a diverse set of previously unseen forecasting datasets suggests that the model can yield accurate zero-shot forecasts across different domains, forecasting horizons and temporal granularities. |
| Researcher Affiliation | Industry | 1Google Research. Correspondence to: Rajat Sen <senrajat@google.com>, Yichen Zhou <yichenzhou@google.com>. |
| Pseudocode | No | The paper describes the model architecture and training process in text and with a diagram (Figure 1), but it does not include formal pseudocode or an algorithm block. |
| Open Source Code | Yes | A version of Times FM has been released on Hugging at timesfm-1.0-200m, with corresponding inference code. |
| Open Datasets | Yes | We evaluate our model in zero-shot settings on three groups of well known public datasets against the best performing baselines for each group. These datasets have been intentionally held out from our pretraining data. We address this problem by sourcing the bulk of data used to train our models from three major sources: Google trends, Wiki Pageview statistics and synthetic time-series. Google Trends. https://trends.google.com Wiki Pageviews. https://en.wikipedia.org/wiki/Wikipedia:Pageview_statistics |
| Dataset Splits | Yes | We report performance on the official metrics and scalings of the datasets, using either their standard test splits or common test splits in other literature. We follow the same protocol as in GPT4TS (Zhou et al., 2023) (see Table 13 in their paper). (Zhou et al., 2023) finetune GPT2 input and output blocks on long-term forecasting benchmarks on 10% of the of the original datasets and compare it against models trained from scratch on the same data. |
| Hardware Specification | Yes | All experiments were performed on a TPUv5e6 setup with 16 tensor-cores. For the 200M model it takes 2 days to complete 1.5M iterations on our setup. |
| Software Dependencies | No | The paper mentions software like "Hugging" and implies the use of frameworks for deep learning (e.g., "PyTorch" or TensorFlow, given Google Research affiliation), but it does not specify any version numbers for these software components or libraries. |
| Experiment Setup | Yes | For our main 200M model we use 16 attention heads, 20 layers, a input patch length of 32 and output patch length of 128. The model dimension is set to 1280. We train with layer norm and a cosine decay learning rate schedule with peak learning rate of 5e-4. We train with a maximum context length of 512 whenever the length of the time-series allows that. For weekly granularity we do not have sufficiently long time-series; therefore a maximum context length of 256 is used. For the same reason, a maximum context length of 64 is used while training on monthly granularity data. |