Verifiably Robust Conformal Prediction

Authors: Linus Jeary, Tom Kuipers, Mehran Hosseini, Nicola Paoletti

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate and compare our approach on image classification tasks (CIFAR10, CIFAR100, and Tiny Image Net) and regression tasks for deep reinforcement learning environments.
Researcher Affiliation Academia Linus Jeary Department of Informatics King s College London, UK linus.jeary@kcl.ac.uk Tom Kuipers Department of Informatics King s College London, UK tom.kuipers@kcl.ac.uk Mehran Hosseini Department of Informatics King s College London, UK mehran.hosseini@kcl.ac.uk Nicola Paoletti Department of Informatics King s College London, UK nicola.paoletti@kcl.ac.uk
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks that are clearly labeled or formatted like code procedures.
Open Source Code Yes Code for the experiments is available at: https://github.com/ddv-lab/Verifiably_Robust_CP
Open Datasets Yes We evaluate and compare our approach on image classification tasks (CIFAR10, CIFAR100, and Tiny Image Net) and regression tasks for deep reinforcement learning environments. [...] We evaluate our VRCP framework on regression tasks from the Petting Zoo Multi-Particle Environment (MPE) library Terry et al. (2021) for deep reinforcement learning.
Dataset Splits Yes We evaluate each method using a nominal coverage of 1 α = 0.9 and report the 95% confidence intervals for coverage and average set sizes computed over 50 splits (nsplits = 50) of the calibration, holdout and test set. [...] For CIFAR10 and CIFAR100 |Dtrain| = 50,000 and for Tiny Image Net |Dtrain| = 100,000. For all datasets |Dcal| = 4,500 and |Dtest| = 5,000. [...] We partition the dataset into the following partitions: |Dtrain| = 1,000, |Dcal| = 2,000 and |Dtest| = 2,000.
Hardware Specification Yes All experimental results were obtained from running the code provided in our Git Hub repository on a server with 2x Intel Xeon Platinum 8360Y (36 cores, 72 threads, 2.4GHz), 512GB of RAM and an NVIDIA A40 48GB GPU.
Software Dependencies No The paper mentions several software components like auto_Li RPA, CROWN, α-CROWN, PGD attack algorithm, and Petting Zoo Multi-Particle Environment (MPE) library. However, it does not provide specific version numbers for these software dependencies, which are necessary for reproducible descriptions.
Experiment Setup Yes RSCP+ based approaches use σ = 2ϵ, β = 0.001 and those with PTT use |Dhold| = 500, b = 0.9 and T = 1/400. For PGD, we choose a step size of 1/255 and compute 100 steps for each attack. [...] All models are trained for 200 epochs with a batch size of 128 using the stochastic gradient descent optimiser with momentum set to 0.9. We also employ a weight decay of 5 10 4 and a cosine annealing learning rate scheduler. [...] The quantile regressors are each trained for 400 epochs, with a learning rate of 10 5, dropout of 0.1 and a decay of 10 5.