Hierarchical Selective Classification

Authors: Shani Goren, Ido Galil, Ran El-Yaniv

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive empirical studies on over a thousand Image Net classifiers reveal that training regimes such as CLIP, pretraining on Image Net21k, and knowledge distillation boost hierarchical selective performance. Lastly, we show that HSC improves both selective performance and confidence calibration. In this section we evaluate the methods introduced in Section 3 and Section 4. The evaluation was performed on 1,115 vision models pretrained on Image Net1k [11], and 6 models pretrained on i Nat-21 [24] (available in timm 0.9.16 [49] and torchvision 0.15.1 [32]).
Researcher Affiliation Collaboration Shani Goren* Technion shani.goren@gmail.com Ido Galil* Technion, NVIDIA idogalil.ig@gmail.com, igalil@nvidia.com Ran El-Yaniv Technion, NVIDIA rani@cs.technion.ac.il, relyaniv@nvidia.com
Pseudocode Yes Algorithm 1 Climbing Inference Rule
Open Source Code Yes Code is available at https:// github.com/shanigoren/Hierarchical-Selective-Classification.
Open Datasets Yes The evaluation was performed on 1,115 vision models pretrained on Image Net1k [11], and 6 models pretrained on i Nat-21 [24] (available in timm 0.9.16 [49] and torchvision 0.15.1 [32]).
Dataset Splits Yes The reported results were obtained on the corresponding validation sets (Image Net1k and i Nat-21). For each model and target accuracy the algorithm was run 1000 times, each with a randomly drawn calibration set. Table 2: Results (mean scores) comparing the hierarchical selective threshold algorithm (Algorithm 2) with DARTS, repeated 1000 times for each model and target accuracy with a randomly drawn calibration set of 5,000 samples. Table 5: Comparison of hierarchical selective threshold algorithm (Algorithm 2) with DARTS, repeated 100 times for each model and target accuracy with a randomly drawn calibration set of 10,000 samples.
Hardware Specification Yes All experiments were conducted on a single machine with one Nvidia A4000 GPU.
Software Dependencies Yes available in timm 0.9.16 [49] and torchvision 0.15.1 [32]
Experiment Setup Yes All models were fine-tuned on the Image Net validation set for 20 epochs using the SGD optimizer with a warmup scheduler and a batch size of 2048 on a single NVIDIA A40 GPU.