Meta-Cal: Well-controlled Post-hoc Calibration by Ranking
Authors: Xingchen Ma, Matthew B. Blaschko
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on CIFAR-10, CIFAR-100 and Image Net and a range of popular network architectures show our proposed method significantly outperforms the current state of the art for post-hoc multi-class classification calibration. |
| Researcher Affiliation | Academia | 1ESAT-PSI, KU Leuven, Belgium. Correspondence to: Xingchen Ma <xingchen.ma@esat.kuleuven.be>. |
| Pseudocode | Yes | Algorithm 1 Meta-Cal (miscoverage control) and Algorithm 2 Meta-Cal (coverage accuracy control). |
| Open Source Code | Yes | Code is available at https://github.com/maxc01/ metacal |
| Open Datasets | Yes | For CIFAR-10 and CIFAR-100, the following networks are used: Dense Net (Huang et al., 2016a), Res Net (He et al., 2015), Res Net with stochastic depth (Huang et al., 2016b), Wide Res Net (Zagoruyko & Komodakis, 2016). 45000 out of 60000 images are used for training these classifiers. The remaining 15000 images are held out for training and evalu ating post-hoc calibration methods. For Image Net, we use pre-trained Dense Net-161 and Res Net-152 from Py Torch (Paszke et al., 2019). |
| Dataset Splits | Yes | The remaining 15000 images are held out for training and evalu ating post-hoc calibration methods. The training details are given in Supplement C. These 15000 samples are randomly split into 5000/10000 samples to train and evaluate a posthoc calibration method. For Image Net, we use pre-trained Dense Net-161 and Res Net-152 from Py Torch (Paszke et al., 2019). 50000 images in the validation set are used for train ing and evaluating post-hoc calibration methods. To train and test a calibration map, we randomly split these samples into 25000/25000 images. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using PyTorch but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | The experimental configurations specific to our proposed ap proach are as follows. For Meta-Cal under the miscoverage rate constraint, we set the miscoverage rate tolerance to be 0.05 for all neural network classifiers and all data sets used in the experiments. For Meta-Cal under the coverage accu racy constraint, we set the desired coverage accuracy to be 0.97, 0.87, 0.85 for CIFAR-10, CIFAR-100 and Image Net, respectively. In both settings, we randomly select 1/10 samples (up to 500 samples) from the calibration data set to construct a binary classifier or estimate the coverage accuracy transformation function. |