Conformal Prediction for Deep Classifier via Label Ranking
Authors: Jianguo Huang, Huajun Xi, Linjun Zhang, Huaxiu Yao, Yue Qiu, Hongxin Wei
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate that SAPS not only lessens the prediction sets but also broadly enhances the conditional coverage rate of prediction sets. To verify the effectiveness of our method, we conduct thorough empirical evaluations on common benchmarks, including CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and Image Net (Deng et al., 2009). |
| Researcher Affiliation | Academia | 1Department of Statistics and Data Science, Southern University of Science and Technology. 2The School of Information Science and Technology, Shanghai Tech University 3Department of Statistics, Rutgers University 4Department of Computer Science, University of North Carolina at Chapel Hill 5College of Mathematics and Statistics, Chongqing University. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code are publicly available at https://github.com/ml-stat Sustech/conformal prediction via label ranking. |
| Open Datasets | Yes | Datasets. We consider three prominent datasets in our experiments: Image Net (Deng et al., 2009), CIFAR-100 and CIFAR-10 (Krizhevsky et al., 2009), which are common benchmarks for conformal prediction. |
| Dataset Splits | Yes | For Image Net, its test dataset of 50,000 images is divided, allocating 30,000 images to the calibration set and 20,000 images to the test set. For both CIFAR-100 and CIFAR-10, the associated test dataset of 10,000 images is uniformly divided into two subsets: a calibration set and a test set, each comprising 5,000 images. Additionally, we provide the details about the calibration process of conformal prediction algorithms, as shown: 1. Split: we split the full calibration set into a validation set and a calibration set (20 : 80 in this work); |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments (e.g., specific GPU models, CPU types, or memory). |
| Software Dependencies | No | The paper mentions calibrating models using "Temperature scaling procedure (Guo et al., 2017)" but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For methods that has hyperparameters, we choose the hyper-parameter that achieves the smallest set size on a validation set, which is a subset of the calibration set. Specifically, we tune the regularization hyperparameter of RAPS in {0.001, 0.01, 0.1, 0.15, . . . , 0.5} and hyperparameter λ in {0.02, 0.05, 0.1, 0.15, . . . , 0.6} for SAPS. |