Addressing Failure Prediction by Learning Model Confidence

Authors: Charles Corbière, Nicolas THOME, Avner Bar-Hen, Matthieu Cord, Patrick Pérez

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted for validating the relevance of the proposed approach. We study various network architectures, small and large scale datasets for image classification and semantic segmentation. We show that our approach consistently outperforms several strong methods, from MCP to Bayesian uncertainty, as well as recent approaches specifically designed for failure prediction.
Researcher Affiliation Collaboration Charles Corbière1,2 charles.corbiere@valeo.com Nicolas Thome1 nicolas.thome@cnam.fr Avner Bar-Hen1 avner@cnam.fr Matthieu Cord2,3 matthieu.cord@lip6.fr Patrick Pérez2 patrick.perez@valeo.com 1CEDRIC, Conservatoire National des Arts et Métiers, Paris, France 2valeo.ai, Paris, France 3Sorbonne University, Paris, France
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/valeoai/Confid Net.
Open Datasets Yes Datasets. We run experiments on image datasets of varying scale and complexity: MNIST [27] and SVHN [39] datasets provide relatively simple and small (28 28) images of digits (10 classes). CIFAR-10 and CIFAR-100 [24] propose more complex object recognition tasks on low resolution images. We also report experiments for semantic segmentation on Cam Vid [5], a standard road scene dataset.
Dataset Splits Yes We report results on all datasets in Table 3 for validation sets with 10% of samples.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or specific cloud instances) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library names like PyTorch, TensorFlow, or scikit-learn with their respective versions) used for the experiments.
Experiment Setup Yes Lconf(θ; D) = 1/N sum(i=1 to N) (ˆc(xi, θ) c (xi, y i ))2. (4) In the experimental part, we also tried more direct approaches for failure prediction such as a binary cross entropy loss (BCE) between the confidence network score and a incorrect/correct prediction target. We also tried implementing Focal loss [31], a BCE variant which focuses on hard examples. Finally, one can also see failure detection as a ranking problem where good predictions must be ranked before erroneous ones according to a confidence criterion. To this end, we also implemented a ranking loss [36, 7] applied locally on training batch inputs. Our complete confidence model, from input image to confidence score, shares its first encoding part ( Conv Net in Fig.2) with the classification model M. The training of Confid Net starts by fixing entirely M (freezing w) and learning θ using loss (4). In a next step, we can then fine-tune the Conv Net encoder. However, as model M has to remain fixed to compute similar classification predictions, we have now to decouple the feature encoders used for classification and confidence prediction respectively. We also deactivate dropout layers in this last training phase and reduce learning rate to mitigate stochastic effects that may lead the new encoder to deviate too much from the original one used for classification. Data augmentation can thus still be used.