reproducibilityindex.ai

An Information Theoretic Perspective on Conformal Prediction

Authors: Alvaro Correia, Fabio Valerio Massoli, Christos Louizos, Arash Behboodi

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we empirically study two applications of our theoretical results, namely conformal prediction with side information and conformal training with our upper bounds on the conditional entropy as optimization objectives. We focus our experiments on classification tasks since this is the most common setting in previous works in conformal training [8, 13, 61]. In Table 1, we report the empirical inefficiency on test data considering two SCP methods, threshold CP with probabilities (or THR) [56] and APS [55] see Appendix G.1.1 for results with RAPS [4].
Researcher Affiliation	Industry	Qualcomm AI Research Amsterdam, The Netherlands {acorreia, fmassoli, clouizos, behboodi}@qti.qualcomm.com
Pseudocode	Yes	Algorithm 1: Conformal training algorithm.
Open Source Code	Yes	The source code will be hosted at github.com/Qualcomm-AI-research/info_cp.
Open Datasets	Yes	We test the effectiveness of our upper bounds as objectives for conformal training in five data sets: MNIST [29], Fashion-MNIST [69], EMNIST [11], CIFAR10 and CIFAR100 [25].
Dataset Splits	Yes	For each data set, we use the default train and test splits but transfer 10% of the training data to the test data set. We train the classifiers only on the remaining 90% of the training data and, at test time, run SCP with 10 different calibration/test splits by randomly splitting the enlarged test data set.
Hardware Specification	No	The paper mentions 'commercially available NVIDIA GPUs' but does not provide specific model numbers or detailed hardware specifications.
Software Dependencies	No	The paper mentions 'Python 3', 'Pytorch [49]', and 'torchvision [40]' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We followed the experimental procedure of [61], and for each dataset and each method, we ran a grid search over the following hyperparameters using ray tune [45]: Batch size with possible values in {100, 500, 1000}. Learning rate with possible values in {0.05, 0.01, 0.005}. Temperature used in relaxing the construction of prediction sets at training time. We considered temperature values in {0.01, 0.1, 0.5, 1.0}. Steepness of the differentiable sorting algorithm (monotonic sorting networks with Cauchy distribution [50]), which regulates the smoothness of the sorting operator; the higher the steepness value, the closer the differentiable sorting operator is to standard sorting. We considered steepness values in {1, 10, 100}.