ForecastPFN: Synthetically-Trained Zero-Shot Forecasting

Authors: Samuel Dooley, Gurnoor Singh Khurana, Chirag Mohapatra, Siddartha V Naidu, Colin White

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we show that zero-shot predictions made by Forecast PFN are more accurate and faster compared to state-of-the-art forecasting methods, even when the other methods are allowed to train on hundreds of additional in-distribution data points.
Researcher Affiliation Collaboration 1 Abacus.AI, 2 Caltech
Pseudocode No The paper describes the model architecture and training procedure in detail, but it does not include any explicit pseudocode blocks or algorithms labeled as such.
Open Source Code Yes Our codebase and our model are available at https://github.com/abacusai/forecastpfn.
Open Datasets Yes To ensure a fair comparison, we evaluate on seven popular, real-world datasets across energy systems, economics, traffic, and weather: ECL (Electricity Consuming Load) [46], ETT 1 and 2 (Electricity Transformer Temperature) [52], Exchange [28], Illness [17], Traffic [39], and Weather [1].
Dataset Splits Yes All non-zero-shot methods are allowed to train on {(t, yt)}500 t=500 x. Then, at test time, all algorithms see the 36 input data points and make a prediction length of ℓ, e.g., input of {(t, yt)}536 t=501 and make predictions for timesteps t = 537 to 537 + ℓ. We allow algorithms to use 10% of their data budget on validation.
Hardware Specification Yes Each epoch consists of 1 024 000 tasks, and we trained the transformer for 600 epochs with the Adam optimizer [25] on a single Tesla V100 16GB GPU, which took 30 hours.
Software Dependencies No The paper mentions using 'Adam optimizer' and 'pmdarima [43]' for ARIMA, and 'official codebase' for other methods, but it does not specify version numbers for Python, PyTorch, or any other libraries/frameworks.
Experiment Setup Yes We set the input length ℓ= 100 and the maximum prediction of 10 steps into the future. Each epoch consists of 1 024 000 tasks, and we trained the transformer for 600 epochs with the Adam optimizer [25] on a single Tesla V100 16GB GPU, which took 30 hours. We use the Adam optimizer with a learning rate of 0.0001, and MSE loss.