Verifiably Robust Conformal Prediction
Authors: Linus Jeary, Tom Kuipers, Mehran Hosseini, Nicola Paoletti
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate and compare our approach on image classification tasks (CIFAR10, CIFAR100, and Tiny Image Net) and regression tasks for deep reinforcement learning environments. |
| Researcher Affiliation | Academia | Linus Jeary Department of Informatics King s College London, UK linus.jeary@kcl.ac.uk Tom Kuipers Department of Informatics King s College London, UK tom.kuipers@kcl.ac.uk Mehran Hosseini Department of Informatics King s College London, UK mehran.hosseini@kcl.ac.uk Nicola Paoletti Department of Informatics King s College London, UK nicola.paoletti@kcl.ac.uk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks that are clearly labeled or formatted like code procedures. |
| Open Source Code | Yes | Code for the experiments is available at: https://github.com/ddv-lab/Verifiably_Robust_CP |
| Open Datasets | Yes | We evaluate and compare our approach on image classification tasks (CIFAR10, CIFAR100, and Tiny Image Net) and regression tasks for deep reinforcement learning environments. [...] We evaluate our VRCP framework on regression tasks from the Petting Zoo Multi-Particle Environment (MPE) library Terry et al. (2021) for deep reinforcement learning. |
| Dataset Splits | Yes | We evaluate each method using a nominal coverage of 1 α = 0.9 and report the 95% confidence intervals for coverage and average set sizes computed over 50 splits (nsplits = 50) of the calibration, holdout and test set. [...] For CIFAR10 and CIFAR100 |Dtrain| = 50,000 and for Tiny Image Net |Dtrain| = 100,000. For all datasets |Dcal| = 4,500 and |Dtest| = 5,000. [...] We partition the dataset into the following partitions: |Dtrain| = 1,000, |Dcal| = 2,000 and |Dtest| = 2,000. |
| Hardware Specification | Yes | All experimental results were obtained from running the code provided in our Git Hub repository on a server with 2x Intel Xeon Platinum 8360Y (36 cores, 72 threads, 2.4GHz), 512GB of RAM and an NVIDIA A40 48GB GPU. |
| Software Dependencies | No | The paper mentions several software components like auto_Li RPA, CROWN, α-CROWN, PGD attack algorithm, and Petting Zoo Multi-Particle Environment (MPE) library. However, it does not provide specific version numbers for these software dependencies, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | RSCP+ based approaches use σ = 2ϵ, β = 0.001 and those with PTT use |Dhold| = 500, b = 0.9 and T = 1/400. For PGD, we choose a step size of 1/255 and compute 100 steps for each attack. [...] All models are trained for 200 epochs with a batch size of 128 using the stochastic gradient descent optimiser with momentum set to 0.9. We also employ a weight decay of 5 10 4 and a cosine annealing learning rate scheduler. [...] The quantile regressors are each trained for 400 epochs, with a learning rate of 10 5, dropout of 0.1 and a decay of 10 5. |