Beyond calibration: estimating the grouping loss of modern neural networks

Authors: Alexandre Perez-Lebel, Marine Le Morvan, Gael Varoquaux

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate on simulations that the proposed estimator can provide tight lower-bounds on the grouping loss (Section 5.1). We evidence for the first time the presence of grouping loss on pre-trained vision and language architectures, notably in distribution shifts settings (Section 5.2).
Researcher Affiliation Academia Alexandre Perez-Lebel, Marine Le Morvan, Gaƫl Varoquaux Soda project team, Inria Saclay, Palaiseau, France
Pseudocode No The paper describes methods in text and mathematical formulations but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes The source code for the implementation of the algorithm, experiments, simulations and figures is available on Git Hub: https://github.com/aperezlebel/beyond_calibration.
Open Datasets Yes All datasets are publicly available (Image Net-R, Image Net-C, Image Net-1K, Yahoo Answers Topics)
Dataset Splits Yes We divide the samples of the evaluation set in half making sure that the confidence score distribution is the same in both resulting subsets. On one set, we train the isotonic regression for calibration and calibrate the confidence scores of both sets. [...] with a 50-50 train-test split strategy.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments.
Software Dependencies Yes Models architectures and weights are available on Py Torch v0.12 (Paszke et al., 2019).
Experiment Setup Yes We build confidence scores by applying a softmax to the output logits. We extract a representation of the input images in the high-level feature space of the network... We divide the samples... in half... we train the isotonic regression... Then, we create groups of same-level confidences by binning the confidence scores with 15 equal-width bins in [0, 1]... constrained to one balanced split, with a 50-50 train-test split strategy... typically targeting a region ratio of a dozen, to obtain the best possible lower bound c GLLB.