Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Topology-Aware Conformal Prediction for Stream Networks

Authors: Jifan Zhang, Fangxin Wang, Zihe Song, Philip S Yu, Kaize Ding, Shixiang Zhu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical guarantees on the validity of our approach and demonstrate its superior performance on both synthetic and real-world datasets. Our results show that STACI effectively balances prediction efficiency and coverage, outperforming existing conformal prediction methods for stream networks.
Researcher Affiliation	Academia	Jifan Zhang Northwestern University Evanston, IL 60208 EMAIL Fangxin Wang University of Illinois Chicago Chicago, IL 60607 EMAIL Zihe Song University of Illinois Chicago Chicago, IL 60607 EMAIL Philip Yu University of Illinois Chicago Chicago, IL 60607 EMAIL Kaize Ding Northwestern University Evanston, IL 60208 EMAIL Shixiang Zhu Carnegie Mellon University Pittsburgh, PA 15213 EMAIL
Pseudocode	Yes	We provide pseudo-codes for our proposed method in 1.
Open Source Code	Yes	Our code is publicly available at https://github.com/fangxin-wang/STCP.
Open Datasets	Yes	We further conduct experiments on a real-world traffic dataset, Performance Measurement System (Pe MS) [6], which contains the data collected from the California highway network, providing 5-minute interval traffic flow counts by multiple sensors, along with flow directions and distances between sensors.
Dataset Splits	Yes	By default, the first 60% of observations are used for training, the calibration set consists of the most recent n = 500 observations, and the test contains the sequentially revealed observations n = 5000 in simulation and n = 5000 in real study.
Hardware Specification	Yes	Experiments were conducted on a single NVIDIA Ge Force RTX 4080 Super GPU, an AMD Ryzen 9 7950X 16-Core Processor CPU, 64GB Memory and 2TB SSD.
Software Dependencies	No	AGCRN [2], which we adapted the official implementation from https://github.com/Lei BAI/AGCRN under MIT license. ASTGCN [16] was implemented with Py Torch Geometric Temporal package [34] from https://github.com/ benedekrozemberczki/pytorch_geometric_temporal under MIT license. STGODE [13] was adapted from the official implementation https://github.com/square-coder/STGODE under Apache-2.0 license.
Experiment Setup	Yes	By default, the first 60% of observations are used for training, the calibration set consists of the most recent n = 500 observations, and the test contains the sequentially revealed observations n = 5000 in simulation and n = 5000 in real study. The desired confidence rate α is fixed at 0.95. Our method is compared against the following conformal prediction and learning-based uncertainty quantification baselines: (i) Sphere: Spherical confidence set, where the covariance matrix is an identity matrix. (ii) Sphere-ACI (γ = 0.01): Spherical confidence set with adaptive conformal inference (ACI). (iii) Square: Square confidence set. (iv) GT: Ellipsoidal confidence set using the ground-truth covariance matrix. (v) Multi Dim SPCI: Ellipsoidal confidence set using the sample covariance matrix [50], alongside its localized variant using the most recent observations, Multi Dim SPCI (local). (vi) Copula CPTS: Prediction region based on modeling the joint distribution of forecast errors with a copula function [39]. (vii) Deep STUQ: A Bayesian deep learning model that quantifies uncertainty in spatio-temporal graphs by using graph convolutions and Monte Carlo dropout [33]. In synthetic data, the prediction model f is simply a linear regression model. We first estimate parameters of in AR(w) structure, i.e., Θ = (θi)i [w], through linear regression and then parameters in eq. (29), ϕ and σ2 through ℓ1-loss. Parameters of Θ = (0, 0) and Θ = (0.7, 0.3) are selected for data generation. When Θ = (0, 0), the observations consist of pure noise, thus stationary; when Θ = (0.7, 0.3), the process resembles a second-order autoregressive model. The weighting factor λ is set to 0.6. We adopt Adaptive Graph Convolutional Recurrent Network (AGCRN) [2] as the default backbone model f. To demonstrate STACI s generality under post-hoc conformal prediction framework, we evaluate over alternative GNN backbones, including attentionbased ASTGCN [16] and continuous-time STGODE [13]. We set our default λ = 0.6. For simplicity, we only use fixed weights with all equal values, without requiring any additional information. Multiple hyperparameter and ablation study are also provided over the key parameters in our framework: (i) λ from 0 to 1 with step of 0.02; (ii) n = 100, 200, 300, 400, 500; (iii) γ = 0 or 0.01.