reproducibilityindex.ai

Top-label calibration and multiclass-to-binary reductions

Authors: Chirag Gupta, Aaditya Ramdas

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In an empirical evaluation with four deep net architectures on CIFAR-10 and CIFAR-100, we ﬁnd that the M2B + HB procedure achieves lower top-label and class-wise calibration error than other approaches such as temperature scaling.
Researcher Affiliation	Academia	Chirag Gupta & Aaditya Ramdas Carnegie Mellon University {chiragg,aramdas}@cmu.edu
Pseudocode	Yes	Algorithm 1: Conﬁdence calibrator, Algorithm 2: Top-label calibrator, Algorithm 3: Class-wise calibrator, Algorithm 4: Normalized calibrator, Algorithm 5: Post-hoc calibrator for a given M2B calibration notion C, Algorithm 6: Top-K-label calibrator, Algorithm 7: Top-K-conﬁdence calibrator, Algorithm 8: Top-label histogram binning, Algorithm 9: Class-wise histogram binning
Open Source Code	Yes	Code for this work is available at https://github.com/aigen/df-posthoc-calibration.
Open Datasets	Yes	We experimented on the CIFAR-10 and CIFAR-100 datasets
Dataset Splits	Yes	Both CIFAR datasets consist of 60K (60,000) points, which are split as 45K/5K/10K to form the train/validation/test sets.
Hardware Specification	No	This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562 (Towns et al., 2014). Specifically, it used the Bridges-2 system, which is supported by NSF award number ACI-1928147, at the Pittsburgh Supercomputing Center (PSC). This provides names of computing resources, not specific hardware components like GPU/CPU models or memory, making it not reproducible in terms of specific hardware.
Software Dependencies	No	We also used the code at https://github.com/torrvision/focal_calibration for temperature scaling (TS). For vector scaling (VS) and Dirichlet scaling (DS), we used the code of Kull et al. (2019), hosted at https://github.com/dirichletcal/dirichlet_python. This mentions software by name and URL, but does not provide specific version numbers.
Experiment Setup	Yes	No hyperparameter tuning was performed in any of our histogram binning experiments or baseline experiments; default settings were used in every case. The random seed was ﬁxed so that every run of the experiment gives the same result. Hyperparameter: # points per bin k P N (say 50), tie-breaking parameter δ > 0 (say 10^-10).