COLEP: Certifiably Robust Learning-Reasoning Conformal Prediction via Probabilistic Circuits

Authors: Mintong Kang, Nezihe Merve Gürel, Linyi Li, Bo Li

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show the validity and tightness of our certified coverage, demonstrating the robust conformal prediction of COLEP on various datasets. We have conducted extensive experiments on GTSRB, CIFAR-10, and Aw A2 datasets to demonstrate the effectiveness and tightness of the certified coverage for COLEP. We show that the certified prediction coverage of COLEP is significantly higher compared with the SOTA baselines, and COLEP has weakened the tradeoff between prediction coverage and prediction set size. We perform a range of ablation studies to show the impacts of different types of knowledge.
Researcher Affiliation Academia Mintong Kang UIUC mintong2@illinois.edu Nezihe Merve Gürel TU Delft n.m.gurel@tudelft.nl Linyi Li UIUC linyi2@illinois.edu Bo Li UChicago & UIUC bol@uchicago.edu
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We provide the source codes for implementing COLEP at https://github.com/kangmintong/COLEP.
Open Datasets Yes We evaluate COLEP on certified conformal prediction in the adversarial setting on various datasets, including GTSRB (Stallkamp et al., 2012), CIFAR-10, and Aw A2 (Xian et al., 2018).
Dataset Splits Yes We use the official validation set of GTSRB including 973 samples, and randomly select 1000 samples from the test set of CIFAR-10 and Aw A2 as the calibration sets for conformal prediction with the nominal level 1 α = 0.9 across evaluations.
Hardware Specification Yes All the evaluation is done on a single A6000 GPU.
Software Dependencies No The paper mentions 'smoothed training' and references a method but does not specify software names with version numbers for reproducibility (e.g., PyTorch version, Python version, etc.).
Experiment Setup Yes The desired coverage is set 0.9 across evaluations. Note that we leverage randomized smoothing for learning component certification (100k Monte-Carlo sampling) and consider the finite-sample errors of RSCP following Anonymous (2023) and that of COLEP following Thm. 6. We fix weights w as 1.5 and provide more details in Appendix J.1. In the evaluation of certified coverage, we compute the smoothed score or prediction probability 100,000 times for randomized smoothing and fix the ratio of perturbation bound and the smoothing factor during certification δ/σcer as 0.5 for both RSCP and COLEP. In the evaluation of marginal coverage and set size under PGD attack, we use the attack objective of cross-entropy loss for RSCP. For COLEP, the objective is the same cross-entropy loss for the main model and the binary cross-entropy loss for the knowledge models.