Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Online Time Series Forecasting with Theoretical Guarantees

Authors: Zijian Li, Changze Zhou, Minghao Fu, Sanjay Manjunath, Fan Feng, Guangyi Chen, Yingyao Hu, Ruichu Cai, Kun Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiment results on synthetic data support our theoretical claims. Moreover, plugin implementations built on several baselines yield general improvement across multiple benchmarks, highlighting the effectiveness in real-world applications. Extensive experiments on synthetic and real-world benchmarks confirm our theoretical guarantees and demonstrate consistent improvements over several baselines.
Researcher Affiliation	Academia	1Carnegie Mellon University 2Mohamed bin Zayed University of Artificial Intelligence 3Guangdong University of Technology 4 Johns Hopkins University 5 University of California San Diego
Pseudocode	No	The paper describes the model architecture and mathematical equations for the loss function, but it does not contain a clearly labeled pseudocode block or algorithm steps.
Open Source Code	Yes	5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We have provided the code in the supplementary material.
Open Datasets	Yes	Datasets We follow the setting of Wen et al. [2023] and consider the following datasets. ETT is an electricity transformer temperature dataset, which contains two separate datasets {ETTh2, ETTm1}. Exchange is the daily exchange rate dataset from eight foreign countries. Weather is recorded at the Weather Station at the Max Planck Institute for Biogeochemistry in Germany. ECL is an electricity-consuming load dataset with the electricity consumption. Traffic is a dataset of traffic speeds collected from the California Transportation Agencies Performance Measurement System. Footnotes for ECL, Traffic, and Weather provide URLs: 4https://archive.ics.uci.edu/dataset/321/electricityloaddiagrams20112014 5https://pems.dot.ca.gov/ 6https://www.bgc-jena.mpg.de/wetter/
Dataset Splits	Yes	D.1.1 Data Generation Process: The total size of the dataset is 100,000, with 1,024 samples designated as the validation set. The remaining samples are the training set.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. While the NeurIPS Paper Checklist states 'We have provided in the implementation details' for computer resources, these details are not present in Appendix D.2.2 or any other section of the paper.
Software Dependencies	No	The paper mentions implementing parts of the method using PyTorch in Appendix D.2.2 but does not specify its version or any other software dependencies with version numbers.
Experiment Setup	No	The paper describes model architectures in detail in Appendix D.2.2 but does not specify concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings for the experiments. It mentions hyperparameters α, β, and γ in the total loss equation (11) but their specific values are not provided.