Top-label calibration and multiclass-to-binary reductions
Authors: Chirag Gupta, Aaditya Ramdas
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In an empirical evaluation with four deep net architectures on CIFAR-10 and CIFAR-100, we find that the M2B + HB procedure achieves lower top-label and class-wise calibration error than other approaches such as temperature scaling. |
| Researcher Affiliation | Academia | Chirag Gupta & Aaditya Ramdas Carnegie Mellon University {chiragg,aramdas}@cmu.edu |
| Pseudocode | Yes | Algorithm 1: Confidence calibrator, Algorithm 2: Top-label calibrator, Algorithm 3: Class-wise calibrator, Algorithm 4: Normalized calibrator, Algorithm 5: Post-hoc calibrator for a given M2B calibration notion C, Algorithm 6: Top-K-label calibrator, Algorithm 7: Top-K-confidence calibrator, Algorithm 8: Top-label histogram binning, Algorithm 9: Class-wise histogram binning |
| Open Source Code | Yes | Code for this work is available at https://github.com/aigen/df-posthoc-calibration. |
| Open Datasets | Yes | We experimented on the CIFAR-10 and CIFAR-100 datasets |
| Dataset Splits | Yes | Both CIFAR datasets consist of 60K (60,000) points, which are split as 45K/5K/10K to form the train/validation/test sets. |
| Hardware Specification | No | This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562 (Towns et al., 2014). Specifically, it used the Bridges-2 system, which is supported by NSF award number ACI-1928147, at the Pittsburgh Supercomputing Center (PSC). This provides names of computing resources, not specific hardware components like GPU/CPU models or memory, making it not reproducible in terms of specific hardware. |
| Software Dependencies | No | We also used the code at https://github.com/torrvision/focal_calibration for temperature scaling (TS). For vector scaling (VS) and Dirichlet scaling (DS), we used the code of Kull et al. (2019), hosted at https://github.com/dirichletcal/dirichlet_python. This mentions software by name and URL, but does not provide specific version numbers. |
| Experiment Setup | Yes | No hyperparameter tuning was performed in any of our histogram binning experiments or baseline experiments; default settings were used in every case. The random seed was fixed so that every run of the experiment gives the same result. Hyperparameter: # points per bin k P N (say 50), tie-breaking parameter δ > 0 (say 10^-10). |