reproducibilityindex.ai

Provably Robust Conformal Prediction with Improved Efficiency

Authors: Ge Yan, Yaniv Romano, Tsui-Wei Weng

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results in CIFAR10, CIFAR100, and Image Net suggest the baseline method only yields trivial predictions including full label set, while our methods could boost the efficiency by up to 4.36 , 5.46 , and 16.9 respectively and provide practical robustness guarantee.
Researcher Affiliation	Academia	Ge Yan CSE, UCSD geyan@ucsd.edu Yaniv Romano ECE, Technion yromano@technion.ac.il Tsui-Wei Weng HDSI, UCSD lweng@ucsd.edu
Pseudocode	No	The paper includes a diagram in Figure 2 labeled 'RSCP+ algorithm' but it is a visual representation of steps, not a structured pseudocode block or algorithm.
Open Source Code	Yes	The code is released on https://github.com/Trustworthy-ML-Lab/Provably-Robust-Conformal-Prediction.
Open Datasets	Yes	Experiments are conducted on CIFAR10, CIFAR100 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009)
Dataset Splits	Yes	For evaluation, we split the validation set into three subsets: Dholdout for ranking transformation in Sec. 4.1, Dcal for calibration, and Dtest for evaluation of results. For the size of each subset, please refer to Tab. D.2. ... Table D.2: Spilt of each dataset. Dataset training holdout calibration test CIFAR10 50000 500 4750 4750 CIFAR100 50000 500 4750 4750 Image Net 500 24750 24750
Hardware Specification	Yes	The experiment is performed on 2 NVIDIA V100 GPU.
Software Dependencies	No	The paper mentions software components and training parameters like 'SGD with momentum 0.9' and 'Nesterov gradient' (Table D.1), but it does not provide specific version numbers for key software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or programming languages.
Experiment Setup	Yes	Hyperparameters. In RSCP+, we choose β = 0.001 and the number of Monte Carlo examples NMC = 256. For PTT, we choose b = 0.9 and T = 1/400 and we discuss this choice in Appendix B.4. The size of holdout set \|Dholdout\| = 500. ... In training, we use SGD with momentum 0.9 and Nesterov gradient. Weight decay is set to 0.0005. We finetune the model for Nepoch = 150 epochs and scale the learning rate down by 0.1 at epoch 60, 90 and 120. In Eq. (25), we choose Ntrain = 8. For other hyper-parameters, see Tab. D.1.