COLEP: Certifiably Robust Learning-Reasoning Conformal Prediction via Probabilistic Circuits
Authors: Mintong Kang, Nezihe Merve Gürel, Linyi Li, Bo Li
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show the validity and tightness of our certified coverage, demonstrating the robust conformal prediction of COLEP on various datasets. We have conducted extensive experiments on GTSRB, CIFAR-10, and Aw A2 datasets to demonstrate the effectiveness and tightness of the certified coverage for COLEP. We show that the certified prediction coverage of COLEP is significantly higher compared with the SOTA baselines, and COLEP has weakened the tradeoff between prediction coverage and prediction set size. We perform a range of ablation studies to show the impacts of different types of knowledge. |
| Researcher Affiliation | Academia | Mintong Kang UIUC mintong2@illinois.edu Nezihe Merve Gürel TU Delft n.m.gurel@tudelft.nl Linyi Li UIUC linyi2@illinois.edu Bo Li UChicago & UIUC bol@uchicago.edu |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide the source codes for implementing COLEP at https://github.com/kangmintong/COLEP. |
| Open Datasets | Yes | We evaluate COLEP on certified conformal prediction in the adversarial setting on various datasets, including GTSRB (Stallkamp et al., 2012), CIFAR-10, and Aw A2 (Xian et al., 2018). |
| Dataset Splits | Yes | We use the official validation set of GTSRB including 973 samples, and randomly select 1000 samples from the test set of CIFAR-10 and Aw A2 as the calibration sets for conformal prediction with the nominal level 1 α = 0.9 across evaluations. |
| Hardware Specification | Yes | All the evaluation is done on a single A6000 GPU. |
| Software Dependencies | No | The paper mentions 'smoothed training' and references a method but does not specify software names with version numbers for reproducibility (e.g., PyTorch version, Python version, etc.). |
| Experiment Setup | Yes | The desired coverage is set 0.9 across evaluations. Note that we leverage randomized smoothing for learning component certification (100k Monte-Carlo sampling) and consider the finite-sample errors of RSCP following Anonymous (2023) and that of COLEP following Thm. 6. We fix weights w as 1.5 and provide more details in Appendix J.1. In the evaluation of certified coverage, we compute the smoothed score or prediction probability 100,000 times for randomized smoothing and fix the ratio of perturbation bound and the smoothing factor during certification δ/σcer as 0.5 for both RSCP and COLEP. In the evaluation of marginal coverage and set size under PGD attack, we use the attack objective of cross-entropy loss for RSCP. For COLEP, the objective is the same cross-entropy loss for the main model and the binary cross-entropy loss for the knowledge models. |