Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
This Time is Different: An Observability Perspective on Time Series Foundation Models
Authors: Ben Cohen, Emaad Khwaja, Youssef Doubli, Salahidine Lemaachi, Chris Lettieri, Charles Masson, Hugo Miccinilli, Elise Ramé, Qiqi Ren, Afshin Rostamizadeh, Jean du Terrail, Anna-Monica Toon, Kan Wang, Stephan Xie, Zongzhe Xu, Viktoriya Zhukova, David Asker, Ameet S Talwalkar, Othmane Abou-Amal
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations demonstrate that TOTO achieves state-of-the-art performance on both BOOM and on established general purpose time series forecasting benchmarks. TOTO s model weights ( https://huggingface.co/Datadog/ Toto-Open-Base-1.0), inference code , and evaluation scripts (https://www. https://github.com/Data Dog/toto), as well as BOOM s data and evaluation code (https://www.https://huggingface.co/datasets/Datadog/BOOM), are all available as open source under the Apache 2.0 License. |
| Researcher Affiliation | Collaboration | EMAIL Datadog AI Research Carnegie Mellon University. AT and SX contributed to this work in their Datadog capacities. |
| Pseudocode | Yes | Listing 1: Vectorized Py Torch implementation of Welford s algorithm for computing causal statistics |
| Open Source Code | Yes | TOTO s model weights ( https://huggingface.co/Datadog/ Toto-Open-Base-1.0), inference code , and evaluation scripts (https://www. https://github.com/Data Dog/toto), as well as BOOM s data and evaluation code (https://www.https://huggingface.co/datasets/Datadog/BOOM), are all available as open source under the Apache 2.0 License. |
| Open Datasets | Yes | BOOM (Benchmark of Observability Metrics). We introduce an open-source benchmark specifically for observability time series. BOOM includes a large-scale, novel dataset with 350 million observations across 2,807 distinct multivariate time series, approximately twice the size of the general-purpose GIFT-Eval benchmark [3]. [...] BOOM s data and evaluation code (https://www.https://huggingface.co/datasets/Datadog/BOOM), are all available as open source under the Apache 2.0 License. |
| Dataset Splits | Yes | To evaluate models on BOOM we closely follow the evaluation methodology proposed by Aksu et al. [1] for GIFT-Eval, including its standardized prediction lengths, strides, and train/validation/test splits. [...] Series are then normalized using the first 90% split of points that will be used for the context windows. We further filter out variates exhibiting abnormal scale change in the test split (final 10% of points)... |
| Hardware Specification | Yes | We ran all experiments, including hyperparameter tuning, final model training, and benchmark evaluation on a GPU cluster consisting of A100s and H100s. |
| Software Dependencies | No | No specific version numbers for key software dependencies like PyTorch, Optuna, Adam W, or xformers are provided, only the names of the tools and libraries used. |
| Experiment Setup | Yes | The resulting hyperparameter configuration described in Table 6 obtained the best multistep (average of 96 and 192) MAE on the Datadog validation set. Hyperparameter Value Embedding Dimension 768 MLP Dimension 3072 # Layers 12 # Heads 12 # Variates 32 Spacewise Layer Cadence 12 Patch Size 64 # T Mixture Model Components 24 Annealing Schedule WSD Optimizer Adam W (β1, β2) (0.9579, 0.9581) Weight Decay 0.0014 Initial Learning Rate 0.0005 Warmup Steps 6784 Stable Steps 112,255 Decay Steps 15,962 Batch Size 128 Total Train Steps 135,001 LRobust α 0.0000 LRobust δ 0.1010 λNLL 0.5755 κ 10 Table 6: Hyperparameters for Toto |