Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

AutoTimes: Autoregressive Time Series Forecasters via Large Language Models

Authors: Yong Liu, Guo Qin, Xiangdong Huang, Jianmin Wang, Mingsheng Long

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct thorough evaluations of the performance of Auto Times, including time series forecasting, zero-shot forecasting, and the proposed in-context forecasting. Empirically, Auto Times achieves state-of-the-art with 0.1% trainable parameters and over 5 training/inference speedup compared to advanced LLM-based forecasters. Compared with state-of-the-art methods, our repurposed forecaster achieves superior performance while saving over 80% training and inference time, and further exhibits zero-shot generalizability, in-context forecasting, and scaling behavior empowered by LLMs.
Researcher Affiliation Academia Yong Liu , Guo Qin , Xiangdong Huang, Jianmin Wang, Mingsheng Long B School of Software, BNRist, Tsinghua University, China EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Auto Times Generate Text Embedding; Algorithm 2 Auto Times Repurpose LLM; Algorithm 3 Auto Times LLM Forecasting; Algorithm 4 Auto Times Autoregressive Generation
Open Source Code Yes Code is available at this repository: https://github.com/thuml/Auto Times.
Open Datasets Yes We conduct experiments to evaluate the performance of the proposed Auto Times on seven real-world datasets spanning diverse domains: (1) ETTh1 [48]... (2) Weather [43]... (3) ECL [43]... (4) Traffic [43]... (5) Solar-Energy [18]... (6) M4 competition [25]... (7) M3...
Dataset Splits Yes We follow the same data processing and train-validation-test set split protocol used in Times Net [43], where the train, validation, and test datasets are strictly divided according to chronological order to ensure no data leakage. Table 8: Dataset Size denotes the total number of time points in (Train, Validation, Test) splits respectively.
Hardware Specification Yes All the experiments are conducted using Py Torch [29] on NVIDIA A100 GPUs.
Software Dependencies No The paper mentions 'Py Torch [29]' but does not provide specific version numbers for PyTorch or any other software libraries or dependencies. The reference to PyTorch itself does not include a version number used in this specific work.
Experiment Setup Yes We employ Adam [17] with an initial learning rate in {10 3, 5 10 4, 10 4} and MSE loss for model optimization. We set the number of training epochs as 10. The batch size is chosen from {256, 1024, 2048}. The number of layers is fixed as 2 and the hidden dimension is selected from {256, 512, 1024} according to the validation loss. The segment length is set as S = 96 in multivariate datasets and is set as the prediction length S = F in M3 and M4.