Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Post-hoc estimators for learning to defer to an expert
Authors: Harikrishna Narasimhan, Wittawat Jitkrittum, Aditya K. Menon, Ankit Rawat, Sanjiv Kumar
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experimental Results. We now present empirical results illustrating the efficacy of both our proposed posthoc estimators. |
| Researcher Affiliation | Industry | Harikrishna Narasimhan Google Research, Mountain View EMAIL Wittawat Jitkrittum Google Research, New York EMAIL Aditya Krishna Menon Google Research, New York EMAIL Ankit Singh Rawat Google Research, New York EMAIL Sanjiv Kumar Google Research, New York EMAIL |
| Pseudocode | No | The paper includes figures (Figure 2, Figure 3) that are flowcharts summarizing procedures but do not contain pseudocode or algorithm blocks. |
| Open Source Code | No | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] |
| Open Datasets | Yes | We now report results on the CIFAR-10, CIFAR-100 [16], and Image Net [10] datasets. |
| Dataset Splits | No | For each c0, we train the base model using the CSS loss (4), and report the resulting accuracy. For c0 > 0, the base model exhibits underfitting, evidenced by significant degradation in the training accuracy. In 3.1, we trace this behaviour to the loss applying a high level of label smoothing [31] to incorrect labels. Consequently, the entropy of the base model probabilities steadily increase with c0 (right panel). |
| Hardware Specification | No | 3. If you ran experiments... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No] The newly proposed methods in this paper involve simple post-hoc operations over existing models. Thus, they do not add significant computational overhead. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | On CIFAR-100, we consider a learning to defer setting comprising a Res Net-8 base model and a Res Net-32 expert hexp. We assume a cost cexp(x, y) = c0 + 1(y 6= hexp(x)) of deferring to the expert (see 2 for details on notation), where the fixed cost c0 is varied from [0, 1]. For each c0, we train the base model using the CSS loss (4), and report the resulting accuracy. |