Class-Conditional Conformal Prediction with Many Classes

Authors: Tiffany Ding, Anastasios Angelopoulos, Stephen Bates, Michael Jordan, Ryan J. Tibshirani

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Based on empirical evaluation across four image data sets with many (up to 1000) classes, we find that clustered conformal typically outperforms existing methods in terms of classconditional coverage and set size metrics.
Researcher Affiliation Academia Tiffany Ding University of California, Berkeley tiffany_ding@berkeley.edu Anastasios N. Angelopoulos University of California, Berkeley angelopoulos@berkeley.edu Stephen Bates MIT s_bates@mit.edu Michael I. Jordan University of California, Berkeley jordan@cs.berkeley.edu Ryan J. Tibshirani University of California, Berkeley ryantibs@berkeley.edu
Pseudocode No No explicit pseudocode or algorithm block with a 'Pseudocode' or 'Algorithm' label was found. Procedures are described in text.
Open Source Code Yes Code for reproducing our experiments is available at https://github.com/tiffanyding/ class-conditional-conformal.
Open Datasets Yes We run experiments on the Image Net (Russakovsky et al., 2015), CIFAR-100 (Krizhevsky, 2009), Places365 (Zhou et al., 2018), and i Naturalist (Van Horn et al., 2018) image classification data sets
Dataset Splits Yes We construct calibration sets of varying size by changing the average number of points in each class, denoted navg. For each navg {10, 20, 30, 40, 50, 75, 100, 150}, we construct a calibration set Dcal by sampling navg |Y| examples without replacement from the remaining data Dc fine (where c denotes the set complement). We estimate the conformal quantiles for STANDARD, CLASSWISE, and CLUSTERED on Dcal. The remaining data (Dfine Dcal)c is used as the validation set for computing coverage and set size metrics.
Hardware Specification No The paper mentions using a Res Net-50 model and Py Torch for training, but does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types.
Software Dependencies No The paper mentions 'Py Torch (Paszke et al., 2019)' and 'sklearn.cluster.KMeans, with the default settings (Pedregosa et al., 2011)', but does not provide specific version numbers for these software dependencies, which are required for reproducibility.
Experiment Setup Yes Throughout, we set α = 0.1 for a desired coverage level of 90%. and In all of our experiments, we use λ = 0.01 and kreg = 5, which Angelopoulos et al. (2021) found to work well for Image Net. and For CLUSTERED, we choose γ [0, 1] (the fraction of calibration data points used for clustering) and M 1 (the number of clusters)... we set γ = K/(75 + K) and M = γ n/2 .