reproducibilityindex.ai

Accurate Layerwise Interpretable Competence Estimation

Authors: Vickram Rajendran, William LeVine

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By considering distributional, data, and model uncertainty, ALICE empirically shows accurate competence estimation in common failure situations such as class-imbalanced datasets, out-of-distribution datasets, and poorly trained models. We compare our score with state-of-the-art conﬁdence estimators such as model conﬁdence and Trust Score, and show signiﬁcant improvements in competence prediction over these methods on datasets such as DIGITS, CIFAR10, and CIFAR100.
Researcher Affiliation	Academia	Vickram Rajendran, William Le Vine The Johns Hopkins University Applied Physics Laboratory Laurel, MD 20723 {vickram.rajendran, william.levine}@jhuapl.edu
Pseudocode	No	The paper describes the steps for the ALICE Score but does not present them in a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not provide an explicit statement or link for the release of open-source code for the described methodology.
Open Datasets	Yes	We compare our score with state-of-the-art conﬁdence estimators such as model conﬁdence and Trust Score, and show signiﬁcant improvements in competence prediction over these methods on datasets such as DIGITS, CIFAR10, and CIFAR100. We first examine model uncertainty by performing an ablation study on both overfit and underfit classical models on DIGITS and VGG16 [27] on CIFAR100 [11].
Dataset Splits	Yes	The mean Average Precision is computed across 100 δ s linearly spaced between the minimum and maximum of the E output (e.g. for cross-entropy we space δ s between the minimum and the maximum cross-entropy error on a validation set).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) needed to replicate the experiment.
Experiment Setup	Yes	The mean Average Precision is computed across 100 δ s linearly spaced between the minimum and maximum of the E output (e.g. for cross-entropy we space δ s between the minimum and the maximum cross-entropy error on a validation set). For all experiments, we compute ALICE scores on the penultimate layer, as we empirically found this layer to provide the best results we believe this is due to the penultimate layer having the most well-formed representations before the ﬁnal predictions. Further experimental details are provided in Appendix A.