reproducibilityindex.ai

Hierarchical Selective Classification

Authors: Shani Goren, Ido Galil, Ran El-Yaniv

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive empirical studies on over a thousand Image Net classifiers reveal that training regimes such as CLIP, pretraining on Image Net21k, and knowledge distillation boost hierarchical selective performance. Lastly, we show that HSC improves both selective performance and confidence calibration. In this section we evaluate the methods introduced in Section 3 and Section 4. The evaluation was performed on 1,115 vision models pretrained on Image Net1k [11], and 6 models pretrained on i Nat-21 [24] (available in timm 0.9.16 [49] and torchvision 0.15.1 [32]).
Researcher Affiliation	Collaboration	Shani Goren* Technion shani.goren@gmail.com Ido Galil* Technion, NVIDIA idogalil.ig@gmail.com, igalil@nvidia.com Ran El-Yaniv Technion, NVIDIA rani@cs.technion.ac.il, relyaniv@nvidia.com
Pseudocode	Yes	Algorithm 1 Climbing Inference Rule
Open Source Code	Yes	Code is available at https:// github.com/shanigoren/Hierarchical-Selective-Classification.
Open Datasets	Yes	The evaluation was performed on 1,115 vision models pretrained on Image Net1k [11], and 6 models pretrained on i Nat-21 [24] (available in timm 0.9.16 [49] and torchvision 0.15.1 [32]).
Dataset Splits	Yes	The reported results were obtained on the corresponding validation sets (Image Net1k and i Nat-21). For each model and target accuracy the algorithm was run 1000 times, each with a randomly drawn calibration set. Table 2: Results (mean scores) comparing the hierarchical selective threshold algorithm (Algorithm 2) with DARTS, repeated 1000 times for each model and target accuracy with a randomly drawn calibration set of 5,000 samples. Table 5: Comparison of hierarchical selective threshold algorithm (Algorithm 2) with DARTS, repeated 100 times for each model and target accuracy with a randomly drawn calibration set of 10,000 samples.
Hardware Specification	Yes	All experiments were conducted on a single machine with one Nvidia A4000 GPU.
Software Dependencies	Yes	available in timm 0.9.16 [49] and torchvision 0.15.1 [32]
Experiment Setup	Yes	All models were fine-tuned on the Image Net validation set for 20 epochs using the SGD optimizer with a warmup scheduler and a batch size of 2048 on a single NVIDIA A40 GPU.