When Does Confidence-Based Cascade Deferral Suffice?

Authors: Wittawat Jitkrittum, Neha Gupta, Aditya K. Menon, Harikrishna Narasimhan, Ankit Rawat, Sanjiv Kumar

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide empirical evidence to support our analysis in 4.1 by considering the three failure modes in which confidence-based deferral underperforms. For each of these settings, we compute deferral curves that plot the classification accuracy versus the fraction of samples deferred to the second model (which implicitly measures the overall compute cost). In line with our analysis in 4.1, post-hoc deferral rules offer better accuracy-cost trade-offs in these settings.
Researcher Affiliation Industry Wittawat Jitkrittum Neha Gupta Aditya Krishna Menon Harikrishna Narasimhan Ankit Singh Rawat Sanjiv Kumar Google Research, New York {wittawat, nehagup, adityakmenon, hnarasimhan, ankitsrawat, sanjivk}@google.com
Pseudocode Yes Algorithm 1 Confidence-based cascades of K classifiers Input: K 2 classifiers: p(1), . . . , p(K) : X L, thresholds c(1), . . . , c(K 1) [0, 1] Input: An input instance x X 1: for k = 1, 2, . . . , K 1 do 2: if maxy p(k) y (x) > c(k) then 3: Predict class ˆy = argmaxy p(k) y (x) 4: break 5: end if 6: end for 7: Predict class ˆy = argmaxy p(K) y (x)
Open Source Code No The paper does not contain any explicit statement about open-sourcing the code for the methodology or provide a link to a code repository.
Open Datasets Yes We use Mobile Net V2 [65] as h(1), and a larger Efficient Net B0 [71] as h(2). For hyperparameter details, see Appendix C. [...] We consider CIFAR 100 dataset where training examples from pre-chosen Lnoise {0, 10, 25} classes are assigned a uniformly drawn label. [...] To simulate distribution shift, we consider a long-tailed version of CIFAR 100 [40] where there are h {100, 50, 25} head classes, and 100 h tail classes. Each head class has 500 training images, and each tail class has 50 training images.
Dataset Splits Yes We train g on a held-out validation set. For full technical details of the post-hoc model architecture and training, see Appendix C. [...] Figure 1 shows the accuracy of confidence-based deferral (Confidence) and Relative Confidence (Equation (6)) on the standard Image Net test set as a function of the deferral rate. [...] We consider CIFAR 100 dataset where training examples from pre-chosen Lnoise {0, 10, 25} classes are assigned a uniformly drawn label. The case of Lnoise = 0 corresponds to the standard CIFAR 100 problem.
Hardware Specification No The paper mentions using specific models like 'Mobile Net V2' and 'Efficient Net B0' but does not provide any details about the hardware (e.g., CPU, GPU, memory) used to run the experiments or train these models.
Software Dependencies No The paper mentions 'Multi Layer Perceptron (MLP)' and 'Adam' optimizer, but does not provide specific version numbers for any software libraries (e.g., TensorFlow, PyTorch, scikit-learn) or programming languages used.
Experiment Setup Yes For full technical details of the post-hoc model architecture and training, see Appendix C. We use the objectives described in Table 1 to train g. [...] For a post-hoc approach to be practical, the overhead from invoking a post-hoc model must be small relative to the costs of h(1) and h(2). To this end, in all of the following experiments, the post-hoc model g: X R is based on a lightweight, three-layer Multi Layer Perceptron (MLP) that takes as input the probability outputs from model 1. That is, g(x) = MLP(p(1)(x)) where p(1)(x) L denotes all probability outputs from model 1. Learning g amounts to learning the MLP as the two base models are fixed. We train g on a held-out validation set.