Multi-Class Uncertainty Calibration via Mutual Information Maximization-based Binning
Authors: Kanil Patel, William H. Beluch, Bin Yang, Michael Pfeiffer, Dan Zhang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | I-Max is evaluated according to multiple performance metrics, including accuracy, ECE, Brier and NLL, and compared against benchmark calibration methods across multiple datasets and trained classifiers. |
| Researcher Affiliation | Collaboration | 1Bosch Center for Artificial Intelligence, Renningen, Germany 2Institute of Signal Processing and System Theory, University of Stuttgart, Stuttgart, Germany |
| Pseudocode | Yes | Algo. 1 in the appendix for pseudocode |
| Open Source Code | Yes | Code available at https://github.com/boschresearch/imax-calibration |
| Open Datasets | Yes | We evaluate post-hoc calibration methods on four benchmark datasets, i.e., Image Net (Deng et al., 2009), CIFAR 10/100 (Krizhevsky, 2009) and SVHN (Netzer et al., 2011) |
| Dataset Splits | Yes | We perform class-balanced random splits of the data test set, unless stated otherwise: the calibration and evaluation set sizes are both 25k for Image Net, and 5k for CIFAR10/100. |
| Hardware Specification | No | No explicit details on specific GPU models, CPU models, or other hardware specifications used for running experiments were found. |
| Software Dependencies | No | No specific software versions (e.g., Python 3.x, PyTorch 1.x) or library version numbers (e.g., Numpy X.Y, sklearn X.Y) are provided. |
| Experiment Setup | Yes | All scaling methods use the Adam optimizer with batch size 256 for CIFAR and 4096 for Image Net. The learning rate was set to 10 3 for temperature scaling Guo et al. (2017) and Platt scaling Platt (1999), 0.0001 for vector scaling Guo et al. (2017) and 10 5 for matrix scaling Guo et al. (2017). |