Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Exploring the Noise Robustness of Online Conformal Prediction

Authors: HuaJun Xi, Kangdao Liu, Hao Zeng, Wenguang Sun, Hongxin Wei

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To verify the effectiveness of the robust pinball loss, we conduct extensive experiments on CIFAR100 [19] and Image Net [20] with synthetic uniform label noise. In particular, we integrate the proposed loss into ACI with constant [9] and dynamic learning rates [12], and strongly adaptive online conformal prediction [11]. Empirical results show that the robust pinball loss enhances the noise robustness of online conformal prediction by eliminating the coverage gap caused by the label noise.
Researcher Affiliation	Academia	Huajun Xi1, Kangdao Liu1,2, Hao Zeng1, Wenguang Sun3, Hongxin Wei1 1Department of Statistics and Data Science, Southern University of Science and Technology 2Department of Computer and Information Science, University of Macau 3Center for Data Science, Zhejiang University Correspondence to: Hongxin Wei <EMAIL>
Pseudocode	Yes	Algorithm 1 Noise-Robust Strongly Adaptive Online Conformal Prediction (NR-SAOCP) Algorithm 2 Noise-Robust Scale-Free Online Gradient Descent (NR-SF-OGD)
Open Source Code	Yes	We include an example code in supplemental material.
Open Datasets	Yes	To verify the effectiveness of the robust pinball loss, we conduct extensive experiments on CIFAR100 [19] and Image Net [20] with synthetic uniform label noise.
Dataset Splits	Yes	We use CIFAR-100 [19] and Image Net [20] datasets with synthetic label noise... we train these models for 200 epochs... the coverage gap and the prediction set size are computed over the full test set.
Hardware Specification	No	On Image Net, we use four pre-trained classifiers from Torch Vision [30] Res Net18, Res Net50 [31], Dense Net121 [32] and VGG16 [33]. On CIFAR-100, we train these models for 200 epochs using SGD with a momentum of 0.9, a weight decay of 0.0005, and a batch size of 128.
Software Dependencies	No	On Image Net, we use four pre-trained classifiers from Torch Vision [30] Res Net18, Res Net50 [31], Dense Net121 [32] and VGG16 [33]. On CIFAR-100, we train these models for 200 epochs using SGD with a momentum of 0.9, a weight decay of 0.0005, and a batch size of 128.
Experiment Setup	Yes	The experiments include both constant η = 0.05 and dynamic learning rates ηt = 1/t1/2+ε with ε = 0.1, following prior work [12]). We use CIFAR-100 [19] and Image Net [20] datasets with synthetic label noise. On CIFAR-100, we train these models for 200 epochs using SGD with a momentum of 0.9, a weight decay of 0.0005, and a batch size of 128. We set the initial learning rate as 0.1, and reduce it by a factor of 5 at 60, 120 and 160 epochs.