Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ForecastPFN: Synthetically-Trained Zero-Shot Forecasting
Authors: Samuel Dooley, Gurnoor Singh Khurana, Chirag Mohapatra, Siddartha V Naidu, Colin White
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we show that zero-shot predictions made by Forecast PFN are more accurate and faster compared to state-of-the-art forecasting methods, even when the other methods are allowed to train on hundreds of additional in-distribution data points. |
| Researcher Affiliation | Collaboration | 1 Abacus.AI, 2 Caltech |
| Pseudocode | No | The paper describes the model architecture and training procedure in detail, but it does not include any explicit pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | Our codebase and our model are available at https://github.com/abacusai/forecastpfn. |
| Open Datasets | Yes | To ensure a fair comparison, we evaluate on seven popular, real-world datasets across energy systems, economics, traffic, and weather: ECL (Electricity Consuming Load) [46], ETT 1 and 2 (Electricity Transformer Temperature) [52], Exchange [28], Illness [17], Traffic [39], and Weather [1]. |
| Dataset Splits | Yes | All non-zero-shot methods are allowed to train on {(t, yt)}500 t=500 x. Then, at test time, all algorithms see the 36 input data points and make a prediction length of ℓ, e.g., input of {(t, yt)}536 t=501 and make predictions for timesteps t = 537 to 537 + ℓ. We allow algorithms to use 10% of their data budget on validation. |
| Hardware Specification | Yes | Each epoch consists of 1 024 000 tasks, and we trained the transformer for 600 epochs with the Adam optimizer [25] on a single Tesla V100 16GB GPU, which took 30 hours. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'pmdarima [43]' for ARIMA, and 'official codebase' for other methods, but it does not specify version numbers for Python, PyTorch, or any other libraries/frameworks. |
| Experiment Setup | Yes | We set the input length ℓ= 100 and the maximum prediction of 10 steps into the future. Each epoch consists of 1 024 000 tasks, and we trained the transformer for 600 epochs with the Adam optimizer [25] on a single Tesla V100 16GB GPU, which took 30 hours. We use the Adam optimizer with a learning rate of 0.0001, and MSE loss. |