Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Accurate Time Series Forecasting via Implicit Decoding

Authors: Xinyu Li, Yuchen Luo, Hao Wang, Haoxuan Li, Liuhua Peng, Feng Liu, Yandong Guo, Kun Zhang, Mingming Gong

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results from multiple real-world datasets show that IF can consistently boost mainstream time series models, achieving state-of-the-art forecasting performance. Code is available at this repository: https://github.com/rakuyorain/Implicit-Forecaster. We conduct extensive experiments to evaluate the performance of mainstream TSF models equipped with IF across various datasets and forecasting scenarios, comparing them with their original results to analyze the effectiveness of IF.
Researcher Affiliation	Collaboration	1The University of Melbourne 2Zhejiang University 3Peking University 4AI2 Robotics 5Carnegie Mellon University 6Mohamed bin Zayed University of Artificial Intelligence
Pseudocode	Yes	Algorithm 1 Implicit Forecaster
Open Source Code	Yes	Code is available at this repository: https://github.com/rakuyorain/Implicit-Forecaster.
Open Datasets	Yes	We comprehensively include 14 benchmark datasets commonly used in TSF for our experiments, covering various real-life domains such as energy, traffic, weather, economics, and disease. Specifically, these datasets are ETT (Electricity Transformer Temperature) with 4 subsets, ECL (Electricity Consuming Load), Traffic, Weather [53], Exchange Rate, ILI (Influenza-Like Illness) [41], Pe MS (Traffic data of Caltrans Performance Measurement System) with 4 subsets [12], and Solar Energy [10]. We also provide a direct download link for all well-prepared datasets, including step-by-step instructions for running the experiments.
Dataset Splits	Yes	We split each dataset into training, validation, and test sets in respective ratios of 70%, 15%, and 15%, with all datasets divided strictly in chronological order to prevent data leakage issues.
Hardware Specification	Yes	All the experiments reported in this paper were conducted on a 16-core AMD EPYC 9654 CPU and a single NVIDIA RTX 4090 GPU.
Software Dependencies	Yes	All the models and experimental frameworks are implemented entirely in Python and built upon Py Torch 2.0 [22].
Experiment Setup	Yes	We choose Adam optimizer [9] and L2 loss to learn the model parameters and take MSE (Mean Squared Error) and MAE (Mean Absolute Error) as metrics to evaluate the models. The learning rate is scheduled to follow an exponential decay pattern during training, which is halved at the end of each epoch. The number of training epochs is determined using an early stopping strategy, where the training is stopped when the model s performance (i.e., loss) ceases to improve on the validation set for a maximum of 3 times.