reproducibilityindex.ai

Beyond calibration: estimating the grouping loss of modern neural networks

Authors: Alexandre Perez-Lebel, Marine Le Morvan, Gael Varoquaux

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate on simulations that the proposed estimator can provide tight lower-bounds on the grouping loss (Section 5.1). We evidence for the first time the presence of grouping loss on pre-trained vision and language architectures, notably in distribution shifts settings (Section 5.2).
Researcher Affiliation	Academia	Alexandre Perez-Lebel, Marine Le Morvan, Gaël Varoquaux Soda project team, Inria Saclay, Palaiseau, France
Pseudocode	No	The paper describes methods in text and mathematical formulations but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	The source code for the implementation of the algorithm, experiments, simulations and figures is available on Git Hub: https://github.com/aperezlebel/beyond_calibration.
Open Datasets	Yes	All datasets are publicly available (Image Net-R, Image Net-C, Image Net-1K, Yahoo Answers Topics)
Dataset Splits	Yes	We divide the samples of the evaluation set in half making sure that the confidence score distribution is the same in both resulting subsets. On one set, we train the isotonic regression for calibration and calibrate the confidence scores of both sets. [...] with a 50-50 train-test split strategy.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running the experiments.
Software Dependencies	Yes	Models architectures and weights are available on Py Torch v0.12 (Paszke et al., 2019).
Experiment Setup	Yes	We build confidence scores by applying a softmax to the output logits. We extract a representation of the input images in the high-level feature space of the network... We divide the samples... in half... we train the isotonic regression... Then, we create groups of same-level confidences by binning the confidence scores with 15 equal-width bins in [0, 1]... constrained to one balanced split, with a 50-50 train-test split strategy... typically targeting a region ratio of a dozen, to obtain the best possible lower bound c GLLB.