Meta-Cal: Well-controlled Post-hoc Calibration by Ranking

Authors: Xingchen Ma, Matthew B. Blaschko

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on CIFAR-10, CIFAR-100 and Image Net and a range of popular network architectures show our proposed method significantly outperforms the current state of the art for post-hoc multi-class classification calibration.
Researcher Affiliation Academia 1ESAT-PSI, KU Leuven, Belgium. Correspondence to: Xingchen Ma <xingchen.ma@esat.kuleuven.be>.
Pseudocode Yes Algorithm 1 Meta-Cal (miscoverage control) and Algorithm 2 Meta-Cal (coverage accuracy control).
Open Source Code Yes Code is available at https://github.com/maxc01/ metacal
Open Datasets Yes For CIFAR-10 and CIFAR-100, the following networks are used: Dense Net (Huang et al., 2016a), Res Net (He et al., 2015), Res Net with stochastic depth (Huang et al., 2016b), Wide Res Net (Zagoruyko & Komodakis, 2016). 45000 out of 60000 images are used for training these classifiers. The remaining 15000 images are held out for training and evalu ating post-hoc calibration methods. For Image Net, we use pre-trained Dense Net-161 and Res Net-152 from Py Torch (Paszke et al., 2019).
Dataset Splits Yes The remaining 15000 images are held out for training and evalu ating post-hoc calibration methods. The training details are given in Supplement C. These 15000 samples are randomly split into 5000/10000 samples to train and evaluate a posthoc calibration method. For Image Net, we use pre-trained Dense Net-161 and Res Net-152 from Py Torch (Paszke et al., 2019). 50000 images in the validation set are used for train ing and evaluating post-hoc calibration methods. To train and test a calibration map, we randomly split these samples into 25000/25000 images.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper mentions using PyTorch but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes The experimental configurations specific to our proposed ap proach are as follows. For Meta-Cal under the miscoverage rate constraint, we set the miscoverage rate tolerance to be 0.05 for all neural network classifiers and all data sets used in the experiments. For Meta-Cal under the coverage accu racy constraint, we set the desired coverage accuracy to be 0.97, 0.87, 0.85 for CIFAR-10, CIFAR-100 and Image Net, respectively. In both settings, we randomly select 1/10 samples (up to 500 samples) from the calibration data set to construct a binary classifier or estimate the coverage accuracy transformation function.