Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning
Authors: Yan Scholten, Stephan Günnemann
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally validate our approach on image classification tasks, achieving strong reliability while maintaining utility and preserving coverage on clean data. |
| Researcher Affiliation | Academia | Yan Scholten, Stephan G unnemann Department of Computer Science & Munich Data Science Institute Technical University of Munich EMAIL |
| Pseudocode | Yes | Algorithm 1 Reliable conformal score function Input: Dtrain, kt, deterministic training algo. T 1: Split Dtrain into kt disjoint partitions P t i P t i = {(xj, yj) Dtrain : h(xj) i (mod kt)} 2: for i = 1 to kt do 3: Train classifier f (i) = T(P t i ) on partition P t i 4: Construct the voting function πy(x) = 1 kt Pkt i=1 1{f (i)(x) = y} 5: Smooth the voting function s(x, y) = eπy/(PK i=1 eπi) Output: Reliable conformal score function s Algorithm 2 Reliable conformal prediction sets Input: Dcalib, kc, s, α, xn+1 1: Split Dcalib into kc disjoint partitions P c i P c i = {(xj, yj) Dcalib : h(xj) i (mod kc)} 2: for i = 1 to kc do 3: Compute scores Si={s(xj, yj)}(xj,yj) P c i 4: Compute αni-quantile τi of scores Si 5: Construct prediction set for quantile τi Ci(xn+1) = {y : s(xn+1, y) τi} 6: Construct majority vote prediction set CM(xn+1)={y :Pkc i=1 1{y Ci(xn+1)}> ˆτ(α)} Output: Reliable conformal prediction set CM |
| Open Source Code | Yes | We also provide code along with detailed reproducibility instructions via the following project page: https://www.cs.cit.tum.de/daml/reliable-conformal-prediction/. |
| Open Datasets | Yes | We train Res Net18, Res Net50 and Res Net101 models (He et al., 2016) on SVHN (Netzer et al., 2011), CIFAR10 and CIFAR100 (Krizhevsky et al., 2009). |
| Dataset Splits | Yes | We randomly select 1,000 images of the test set for calibration and use the remaining 9,000 datapoints for testing. |
| Hardware Specification | Yes | We train Res Net18 models on a NVIDIA GTX 1080TI GPU, and the Res Net50 and Res Net101 models on a NVIDIA A100 40GB. We perform inference of all models on a NVIDIA GTX 1080TI GPU, and compute certificates on a Xeon E5-2630 v4 CPU. |
| Software Dependencies | No | The paper mentions "We use the torchvision library to load the datasets." and "We further deploy a cosine learning rate scheduler (Loshchilov & Hutter, 2017)" but does not specify version numbers for these software components or the underlying framework like PyTorch. |
| Experiment Setup | Yes | We train all models with stochastic gradient descent (learning rate 0.01, momentum 0.9, weight decay 5e-4) for 400 epochs using early stopping if the training accuracy does not improve for 100 epochs. We further deploy a cosine learning rate scheduler (Loshchilov & Hutter, 2017). We use a batch size of 128 during training and 300 at inference. |