Hierarchical Selective Classification
Authors: Shani Goren, Ido Galil, Ran El-Yaniv
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive empirical studies on over a thousand Image Net classifiers reveal that training regimes such as CLIP, pretraining on Image Net21k, and knowledge distillation boost hierarchical selective performance. Lastly, we show that HSC improves both selective performance and confidence calibration. In this section we evaluate the methods introduced in Section 3 and Section 4. The evaluation was performed on 1,115 vision models pretrained on Image Net1k [11], and 6 models pretrained on i Nat-21 [24] (available in timm 0.9.16 [49] and torchvision 0.15.1 [32]). |
| Researcher Affiliation | Collaboration | Shani Goren* Technion shani.goren@gmail.com Ido Galil* Technion, NVIDIA idogalil.ig@gmail.com, igalil@nvidia.com Ran El-Yaniv Technion, NVIDIA rani@cs.technion.ac.il, relyaniv@nvidia.com |
| Pseudocode | Yes | Algorithm 1 Climbing Inference Rule |
| Open Source Code | Yes | Code is available at https:// github.com/shanigoren/Hierarchical-Selective-Classification. |
| Open Datasets | Yes | The evaluation was performed on 1,115 vision models pretrained on Image Net1k [11], and 6 models pretrained on i Nat-21 [24] (available in timm 0.9.16 [49] and torchvision 0.15.1 [32]). |
| Dataset Splits | Yes | The reported results were obtained on the corresponding validation sets (Image Net1k and i Nat-21). For each model and target accuracy the algorithm was run 1000 times, each with a randomly drawn calibration set. Table 2: Results (mean scores) comparing the hierarchical selective threshold algorithm (Algorithm 2) with DARTS, repeated 1000 times for each model and target accuracy with a randomly drawn calibration set of 5,000 samples. Table 5: Comparison of hierarchical selective threshold algorithm (Algorithm 2) with DARTS, repeated 100 times for each model and target accuracy with a randomly drawn calibration set of 10,000 samples. |
| Hardware Specification | Yes | All experiments were conducted on a single machine with one Nvidia A4000 GPU. |
| Software Dependencies | Yes | available in timm 0.9.16 [49] and torchvision 0.15.1 [32] |
| Experiment Setup | Yes | All models were fine-tuned on the Image Net validation set for 20 epochs using the SGD optimizer with a warmup scheduler and a batch size of 2048 on a single NVIDIA A40 GPU. |