reproducibilityindex.ai

Meta-Cal: Well-controlled Post-hoc Calibration by Ranking

Authors: Xingchen Ma, Matthew B. Blaschko

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on CIFAR-10, CIFAR-100 and Image Net and a range of popular network architectures show our proposed method signiﬁcantly outperforms the current state of the art for post-hoc multi-class classiﬁcation calibration.
Researcher Affiliation	Academia	1ESAT-PSI, KU Leuven, Belgium. Correspondence to: Xingchen Ma <xingchen.ma@esat.kuleuven.be>.
Pseudocode	Yes	Algorithm 1 Meta-Cal (miscoverage control) and Algorithm 2 Meta-Cal (coverage accuracy control).
Open Source Code	Yes	Code is available at https://github.com/maxc01/ metacal
Open Datasets	Yes	For CIFAR-10 and CIFAR-100, the following networks are used: Dense Net (Huang et al., 2016a), Res Net (He et al., 2015), Res Net with stochastic depth (Huang et al., 2016b), Wide Res Net (Zagoruyko & Komodakis, 2016). 45000 out of 60000 images are used for training these classiﬁers. The remaining 15000 images are held out for training and evalu ating post-hoc calibration methods. For Image Net, we use pre-trained Dense Net-161 and Res Net-152 from Py Torch (Paszke et al., 2019).
Dataset Splits	Yes	The remaining 15000 images are held out for training and evalu ating post-hoc calibration methods. The training details are given in Supplement C. These 15000 samples are randomly split into 5000/10000 samples to train and evaluate a posthoc calibration method. For Image Net, we use pre-trained Dense Net-161 and Res Net-152 from Py Torch (Paszke et al., 2019). 50000 images in the validation set are used for train ing and evaluating post-hoc calibration methods. To train and test a calibration map, we randomly split these samples into 25000/25000 images.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper mentions using PyTorch but does not specify its version or any other software dependencies with version numbers.
Experiment Setup	Yes	The experimental conﬁgurations speciﬁc to our proposed ap proach are as follows. For Meta-Cal under the miscoverage rate constraint, we set the miscoverage rate tolerance to be 0.05 for all neural network classiﬁers and all data sets used in the experiments. For Meta-Cal under the coverage accu racy constraint, we set the desired coverage accuracy to be 0.97, 0.87, 0.85 for CIFAR-10, CIFAR-100 and Image Net, respectively. In both settings, we randomly select 1/10 samples (up to 500 samples) from the calibration data set to construct a binary classiﬁer or estimate the coverage accuracy transformation function.