reproducibilityindex.ai

A decoder-only foundation model for time-series forecasting

Authors: Abhimanyu Das, Weihao Kong, Rajat Sen, Yichen Zhou

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on a diverse set of previously unseen forecasting datasets suggests that the model can yield accurate zero-shot forecasts across different domains, forecasting horizons and temporal granularities.
Researcher Affiliation	Industry	1Google Research. Correspondence to: Rajat Sen <senrajat@google.com>, Yichen Zhou <yichenzhou@google.com>.
Pseudocode	No	The paper describes the model architecture and training process in text and with a diagram (Figure 1), but it does not include formal pseudocode or an algorithm block.
Open Source Code	Yes	A version of Times FM has been released on Hugging at timesfm-1.0-200m, with corresponding inference code.
Open Datasets	Yes	We evaluate our model in zero-shot settings on three groups of well known public datasets against the best performing baselines for each group. These datasets have been intentionally held out from our pretraining data. We address this problem by sourcing the bulk of data used to train our models from three major sources: Google trends, Wiki Pageview statistics and synthetic time-series. Google Trends. https://trends.google.com Wiki Pageviews. https://en.wikipedia.org/wiki/Wikipedia:Pageview_statistics
Dataset Splits	Yes	We report performance on the ofﬁcial metrics and scalings of the datasets, using either their standard test splits or common test splits in other literature. We follow the same protocol as in GPT4TS (Zhou et al., 2023) (see Table 13 in their paper). (Zhou et al., 2023) ﬁnetune GPT2 input and output blocks on long-term forecasting benchmarks on 10% of the of the original datasets and compare it against models trained from scratch on the same data.
Hardware Specification	Yes	All experiments were performed on a TPUv5e6 setup with 16 tensor-cores. For the 200M model it takes 2 days to complete 1.5M iterations on our setup.
Software Dependencies	No	The paper mentions software like "Hugging" and implies the use of frameworks for deep learning (e.g., "PyTorch" or TensorFlow, given Google Research affiliation), but it does not specify any version numbers for these software components or libraries.
Experiment Setup	Yes	For our main 200M model we use 16 attention heads, 20 layers, a input patch length of 32 and output patch length of 128. The model dimension is set to 1280. We train with layer norm and a cosine decay learning rate schedule with peak learning rate of 5e-4. We train with a maximum context length of 512 whenever the length of the time-series allows that. For weekly granularity we do not have sufﬁciently long time-series; therefore a maximum context length of 256 is used. For the same reason, a maximum context length of 64 is used while training on monthly granularity data.