Confidence Calibration of Classifiers with Many Classes

Authors: Adrien Le Coz, Stéphane Herbin, Faouzi Adjed

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on numerous neural networks used for image or text classification and show that it significantly enhances existing calibration methods. Our code can be accessed at the following link: https://github.com/allglc/tva-calibration.
Researcher Affiliation Collaboration Adrien Le Coz1,2,3 Stéphane Herbin2,3 Faouzi Adjed1 1IRT System X 2ONERA DTIS 3 Paris-Saclay University
Pseudocode Yes Algorithm 1 Top-versus-All approach to confidence calibration
Open Source Code Yes Our code can be accessed at the following link: https://github.com/allglc/tva-calibration.
Open Datasets Yes For image classification, we used the datasets CIFAR-10 (C10) and CIFAR-100 (C100) [27] with 10 and 100 classes respectively, Image Net (IN) [7] with 1000 classes, and Image Net-21K (IN21K) [54] with 10450 classes. For text classification, we used Amazon Fine Foods (AFF) [43] and Dyna Sent (DF) [51] for sentiment analysis with 3 classes, MNLI [66] for natural language inference with 3 classes, and Yahoo Answers (YA) [73] for topic classification on 10 classes.
Dataset Splits Yes Experiment results are averaged over five random seeds that randomly split the concatenation of the original validation and test sets into calibration and test sets.
Hardware Specification Yes Computing time (in seconds) of the calibration on Image Net, using one NVIDIA V100 GPU.
Software Dependencies Yes We used Py Torch 2.0.0 [3] (BSD-style license).
Experiment Setup Yes For HB, we tested equal-size and equal-mass bins, and chose the best variant for each case. All hyperparameters were kept at their default values (10 bins for HB).