Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Conformal Prediction for Time-series Forecasting with Change Points

Authors: Sophia Sun, Rose Yu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show strong empirical results on 3 synthetic and 3 real-world datasets. Compared to state-of-the-art CP baselines, CPTC achieves more robust coverage with comparable prediction intervals sharpness (example in Figure 1), and is computationally light. 5 Experiments Baselines. We selected the following baseline methods. Metrics. We evaluate calibration and sharpness for each method. Datasets. The three synthetic datasets are designed with increasing randomness in mode changes, challenging the adaptivity of CPTC.
Researcher Affiliation	Academia	Sophia Sun, Rose Yu Computer Science and Engineering University of California, San Diego
Pseudocode	Yes	Our CPTC algorithm is outlined in pseudo-code in algorithm 1. Algorithm 1: Conformal Prediction for Time series with Change points (CPTC)
Open Source Code	Yes	Our code is available at https://github.com/Rose-STL-Lab/CPTC.
Open Datasets	Yes	We show strong empirical results on 3 synthetic and 3 real-world datasets. For real-world datasets, the Electricity and Traffic datasets from [19] have hourly frequency and exhibit seasonality both in terms of the time series itself and volatility. The honey bee trajectory dataset [40] is the most complex, composed of 4-dimensional trajectories with length averaging to 900 frames, where the bees dance can be decomposed into left turn , right turn and waggle . [19] D. Dua and E. Karra Taniskidou. UCI machine learning repository, 2017. Accessed: February 7, 2026. [40] S. M. Oh, J. M. Rehg, T. Balch, and F. Dellaert. Learning and inferring motion patterns using parametric segmental switching linear dynamic systems. International Journal of Computer Vision, 77:103 124, 2008.
Dataset Splits	Yes	We segment all datasets by a 70/10/20 train/validation/test split. Conforaml prediction results are reported on the test set only.
Hardware Specification	Yes	All experiments are done on a server machine with an Nvidia A100 GPU, with some data processing and analytics performed on a Apple Macbook Pro laptop computer with M1 chip.
Software Dependencies	No	No specific versions for software dependencies like Python libraries (e.g., PyTorch, TensorFlow) are provided. While Gluon TS [2] is mentioned as a library, its specific version is not stated. Thus, a fully reproducible software environment is not described with version numbers.
Experiment Setup	Yes	We train our RED-SDS on the synthetic datasets, and use the model checkpoints provided by the authors for the real-world datasets. The model architecture consists of a discrete switching component with K {2, 3} categories and a continuous state space with dimensionality dx {2, 4}. For training, we employ the ELBOv2 objective with a learning rate η [5 10 3, 7 10 3], warmup steps of 1000, and gradient clipping at 10.0. The model uses a batch size B {32, 50} and is trained for T {20, 000, 30, 000} steps. The continuous transition and emission models are parameterized by nonlinear MLPs with hidden dimensions h = 32, while the inference network uses either a bidirectional RNN or transformer with embedding dimensions de = 4. For real-world datasets, we apply target transformation and Jacobian correction, while synthetic datasets use raw observations. The model s capacity is controlled through weight decay λ = 10 5 and MLP hidden dimensions h {8, 32, 64} depending on the component. For forecasting, the model is trained to forecast a window of t = 50 time steps for synthetic datasets (bouncing ball) and a windows of for real-world datasets (electricity, traffic) as specified by their respective metadata. The model generates N = 100 (xt, yt, zt)k t=0 triplet trajectories from the trained model by Monte Carlo sampling, and use the samples to calculate prediction quantiles for table 1, in accordance to the original paper.s. We segment all datasets by a 70/10/20 train/validation/test split. Conforaml prediction results are reported on the test set only.