reproducibilityindex.ai

Confidence Calibration of Classifiers with Many Classes

Authors: Adrien Le Coz, Stéphane Herbin, Faouzi Adjed

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on numerous neural networks used for image or text classification and show that it significantly enhances existing calibration methods. Our code can be accessed at the following link: https://github.com/allglc/tva-calibration.
Researcher Affiliation	Collaboration	Adrien Le Coz1,2,3 Stéphane Herbin2,3 Faouzi Adjed1 1IRT System X 2ONERA DTIS 3 Paris-Saclay University
Pseudocode	Yes	Algorithm 1 Top-versus-All approach to confidence calibration
Open Source Code	Yes	Our code can be accessed at the following link: https://github.com/allglc/tva-calibration.
Open Datasets	Yes	For image classification, we used the datasets CIFAR-10 (C10) and CIFAR-100 (C100) [27] with 10 and 100 classes respectively, Image Net (IN) [7] with 1000 classes, and Image Net-21K (IN21K) [54] with 10450 classes. For text classification, we used Amazon Fine Foods (AFF) [43] and Dyna Sent (DF) [51] for sentiment analysis with 3 classes, MNLI [66] for natural language inference with 3 classes, and Yahoo Answers (YA) [73] for topic classification on 10 classes.
Dataset Splits	Yes	Experiment results are averaged over five random seeds that randomly split the concatenation of the original validation and test sets into calibration and test sets.
Hardware Specification	Yes	Computing time (in seconds) of the calibration on Image Net, using one NVIDIA V100 GPU.
Software Dependencies	Yes	We used Py Torch 2.0.0 [3] (BSD-style license).
Experiment Setup	Yes	For HB, we tested equal-size and equal-mass bins, and chose the best variant for each case. All hyperparameters were kept at their default values (10 bins for HB).