Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

Authors: Tai-Yu Pan, Cheng Zhang, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate NORCAL on the LVIS [12] dataset for both long-tailed object detection and instance segmentation. NORCAL can consistently improve not only baseline models (e.g., Faster R-CNN [43] or Mask R-CNN [18]) but also many models that are dedicated to the long-tailed distribution. Hence, our best results notably advance the state of the art. Moreover, NORCAL can improve both the standard average precision (AP) and the category-independent APFixed metric [7], implying that NORCAL does not trade frequent class predictions for rare classes but rather improve the proposal ranking within each class. Indeed, through a detailed analysis, we show that NORCAL can in general improve both the precision and recall for each class, making it appealing to almost any existing evaluation metrics.
Researcher Affiliation Collaboration Tai-Yu Pan1 Cheng Zhang1 Yandong Li2 Hexiang Hu2 Dong Xuan1 Soravit Changpinyo2 Boqing Gong2 Wei-Lun Chao1 1The Ohio State University 2Google Research
Pseudocode No No pseudocode or algorithm block found.
Open Source Code Yes Our code is publicly available at https://github.com/tydpan/Nor Cal.
Open Datasets Yes We validate NORCAL on the LVIS v1 dataset [12], a benchmark dataset for large-vocabulary instance segmentation which has 100K/19.8K/19.8K training/validation/test images.
Dataset Splits Yes We validate NORCAL on the LVIS v1 dataset [12], a benchmark dataset for large-vocabulary instance segmentation which has 100K/19.8K/19.8K training/validation/test images. ... All results are reported on the validation set of LVIS v1.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) are provided.
Software Dependencies No No specific software dependencies with version numbers are mentioned.
Experiment Setup Yes We apply NORCAL to post-calibrate several representative baseline models, for which we use the released checkpoints from the corresponding papers. ... For NORCAL, (a) we investigate different mechanisms by applying post-calibration to the classi๏ฌer logits, exponentials, or probabilities (cf. Eq. 4); (b) we study different types of calibration factor ac, using the class-dependent temperature (CDT) [61] presented in Eq. 5 or the effective number of samples (ENS) [6]; (c) we compare with or without score normalization. We tune the only hyper-parameter of NORCAL (i.e., in ac) on training data.