reproducibilityindex.ai

Addressing Failure Prediction by Learning Model Confidence

Authors: Charles Corbière, Nicolas THOME, Avner Bar-Hen, Matthieu Cord, Patrick Pérez

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are conducted for validating the relevance of the proposed approach. We study various network architectures, small and large scale datasets for image classiﬁcation and semantic segmentation. We show that our approach consistently outperforms several strong methods, from MCP to Bayesian uncertainty, as well as recent approaches speciﬁcally designed for failure prediction.
Researcher Affiliation	Collaboration	Charles Corbière1,2 charles.corbiere@valeo.com Nicolas Thome1 nicolas.thome@cnam.fr Avner Bar-Hen1 avner@cnam.fr Matthieu Cord2,3 matthieu.cord@lip6.fr Patrick Pérez2 patrick.perez@valeo.com 1CEDRIC, Conservatoire National des Arts et Métiers, Paris, France 2valeo.ai, Paris, France 3Sorbonne University, Paris, France
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/valeoai/Conﬁd Net.
Open Datasets	Yes	Datasets. We run experiments on image datasets of varying scale and complexity: MNIST [27] and SVHN [39] datasets provide relatively simple and small (28 28) images of digits (10 classes). CIFAR-10 and CIFAR-100 [24] propose more complex object recognition tasks on low resolution images. We also report experiments for semantic segmentation on Cam Vid [5], a standard road scene dataset.
Dataset Splits	Yes	We report results on all datasets in Table 3 for validation sets with 10% of samples.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or specific cloud instances) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., library names like PyTorch, TensorFlow, or scikit-learn with their respective versions) used for the experiments.
Experiment Setup	Yes	Lconf(θ; D) = 1/N sum(i=1 to N) (ˆc(xi, θ) c (xi, y i ))2. (4) In the experimental part, we also tried more direct approaches for failure prediction such as a binary cross entropy loss (BCE) between the conﬁdence network score and a incorrect/correct prediction target. We also tried implementing Focal loss [31], a BCE variant which focuses on hard examples. Finally, one can also see failure detection as a ranking problem where good predictions must be ranked before erroneous ones according to a conﬁdence criterion. To this end, we also implemented a ranking loss [36, 7] applied locally on training batch inputs. Our complete conﬁdence model, from input image to conﬁdence score, shares its ﬁrst encoding part ( Conv Net in Fig.2) with the classiﬁcation model M. The training of Conﬁd Net starts by ﬁxing entirely M (freezing w) and learning θ using loss (4). In a next step, we can then ﬁne-tune the Conv Net encoder. However, as model M has to remain ﬁxed to compute similar classiﬁcation predictions, we have now to decouple the feature encoders used for classiﬁcation and conﬁdence prediction respectively. We also deactivate dropout layers in this last training phase and reduce learning rate to mitigate stochastic effects that may lead the new encoder to deviate too much from the original one used for classiﬁcation. Data augmentation can thus still be used.