An Information Theoretic Perspective on Conformal Prediction
Authors: Alvaro Correia, Fabio Valerio Massoli, Christos Louizos, Arash Behboodi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically study two applications of our theoretical results, namely conformal prediction with side information and conformal training with our upper bounds on the conditional entropy as optimization objectives. We focus our experiments on classification tasks since this is the most common setting in previous works in conformal training [8, 13, 61]. In Table 1, we report the empirical inefficiency on test data considering two SCP methods, threshold CP with probabilities (or THR) [56] and APS [55] see Appendix G.1.1 for results with RAPS [4]. |
| Researcher Affiliation | Industry | Qualcomm AI Research Amsterdam, The Netherlands {acorreia, fmassoli, clouizos, behboodi}@qti.qualcomm.com |
| Pseudocode | Yes | Algorithm 1: Conformal training algorithm. |
| Open Source Code | Yes | The source code will be hosted at github.com/Qualcomm-AI-research/info_cp. |
| Open Datasets | Yes | We test the effectiveness of our upper bounds as objectives for conformal training in five data sets: MNIST [29], Fashion-MNIST [69], EMNIST [11], CIFAR10 and CIFAR100 [25]. |
| Dataset Splits | Yes | For each data set, we use the default train and test splits but transfer 10% of the training data to the test data set. We train the classifiers only on the remaining 90% of the training data and, at test time, run SCP with 10 different calibration/test splits by randomly splitting the enlarged test data set. |
| Hardware Specification | No | The paper mentions 'commercially available NVIDIA GPUs' but does not provide specific model numbers or detailed hardware specifications. |
| Software Dependencies | No | The paper mentions 'Python 3', 'Pytorch [49]', and 'torchvision [40]' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We followed the experimental procedure of [61], and for each dataset and each method, we ran a grid search over the following hyperparameters using ray tune [45]: Batch size with possible values in {100, 500, 1000}. Learning rate with possible values in {0.05, 0.01, 0.005}. Temperature used in relaxing the construction of prediction sets at training time. We considered temperature values in {0.01, 0.1, 0.5, 1.0}. Steepness of the differentiable sorting algorithm (monotonic sorting networks with Cauchy distribution [50]), which regulates the smoothness of the sorting operator; the higher the steepness value, the closer the differentiable sorting operator is to standard sorting. We considered steepness values in {1, 10, 100}. |