Provably Robust Conformal Prediction with Improved Efficiency

Authors: Ge Yan, Yaniv Romano, Tsui-Wei Weng

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results in CIFAR10, CIFAR100, and Image Net suggest the baseline method only yields trivial predictions including full label set, while our methods could boost the efficiency by up to 4.36 , 5.46 , and 16.9 respectively and provide practical robustness guarantee.
Researcher Affiliation Academia Ge Yan CSE, UCSD geyan@ucsd.edu Yaniv Romano ECE, Technion yromano@technion.ac.il Tsui-Wei Weng HDSI, UCSD lweng@ucsd.edu
Pseudocode No The paper includes a diagram in Figure 2 labeled 'RSCP+ algorithm' but it is a visual representation of steps, not a structured pseudocode block or algorithm.
Open Source Code Yes The code is released on https://github.com/Trustworthy-ML-Lab/Provably-Robust-Conformal-Prediction.
Open Datasets Yes Experiments are conducted on CIFAR10, CIFAR100 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009)
Dataset Splits Yes For evaluation, we split the validation set into three subsets: Dholdout for ranking transformation in Sec. 4.1, Dcal for calibration, and Dtest for evaluation of results. For the size of each subset, please refer to Tab. D.2. ... Table D.2: Spilt of each dataset. Dataset training holdout calibration test CIFAR10 50000 500 4750 4750 CIFAR100 50000 500 4750 4750 Image Net 500 24750 24750
Hardware Specification Yes The experiment is performed on 2 NVIDIA V100 GPU.
Software Dependencies No The paper mentions software components and training parameters like 'SGD with momentum 0.9' and 'Nesterov gradient' (Table D.1), but it does not provide specific version numbers for key software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or programming languages.
Experiment Setup Yes Hyperparameters. In RSCP+, we choose β = 0.001 and the number of Monte Carlo examples NMC = 256. For PTT, we choose b = 0.9 and T = 1/400 and we discuss this choice in Appendix B.4. The size of holdout set |Dholdout| = 500. ... In training, we use SGD with momentum 0.9 and Nesterov gradient. Weight decay is set to 0.0005. We finetune the model for Nepoch = 150 epochs and scale the learning rate down by 0.1 at epoch 60, 90 and 120. In Eq. (25), we choose Ntrain = 8. For other hyper-parameters, see Tab. D.1.