Class-Conditional Conformal Prediction with Many Classes
Authors: Tiffany Ding, Anastasios Angelopoulos, Stephen Bates, Michael Jordan, Ryan J. Tibshirani
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Based on empirical evaluation across four image data sets with many (up to 1000) classes, we find that clustered conformal typically outperforms existing methods in terms of classconditional coverage and set size metrics. |
| Researcher Affiliation | Academia | Tiffany Ding University of California, Berkeley tiffany_ding@berkeley.edu Anastasios N. Angelopoulos University of California, Berkeley angelopoulos@berkeley.edu Stephen Bates MIT s_bates@mit.edu Michael I. Jordan University of California, Berkeley jordan@cs.berkeley.edu Ryan J. Tibshirani University of California, Berkeley ryantibs@berkeley.edu |
| Pseudocode | No | No explicit pseudocode or algorithm block with a 'Pseudocode' or 'Algorithm' label was found. Procedures are described in text. |
| Open Source Code | Yes | Code for reproducing our experiments is available at https://github.com/tiffanyding/ class-conditional-conformal. |
| Open Datasets | Yes | We run experiments on the Image Net (Russakovsky et al., 2015), CIFAR-100 (Krizhevsky, 2009), Places365 (Zhou et al., 2018), and i Naturalist (Van Horn et al., 2018) image classification data sets |
| Dataset Splits | Yes | We construct calibration sets of varying size by changing the average number of points in each class, denoted navg. For each navg {10, 20, 30, 40, 50, 75, 100, 150}, we construct a calibration set Dcal by sampling navg |Y| examples without replacement from the remaining data Dc fine (where c denotes the set complement). We estimate the conformal quantiles for STANDARD, CLASSWISE, and CLUSTERED on Dcal. The remaining data (Dfine Dcal)c is used as the validation set for computing coverage and set size metrics. |
| Hardware Specification | No | The paper mentions using a Res Net-50 model and Py Torch for training, but does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al., 2019)' and 'sklearn.cluster.KMeans, with the default settings (Pedregosa et al., 2011)', but does not provide specific version numbers for these software dependencies, which are required for reproducibility. |
| Experiment Setup | Yes | Throughout, we set α = 0.1 for a desired coverage level of 90%. and In all of our experiments, we use λ = 0.01 and kreg = 5, which Angelopoulos et al. (2021) found to work well for Image Net. and For CLUSTERED, we choose γ [0, 1] (the fraction of calibration data points used for clustering) and M 1 (the number of clusters)... we set γ = K/(75 + K) and M = γ n/2 . |