DropLoss for Long-Tail Instance Segmentation

Authors: Ting-I Hsieh, Esther Robb, Hwann-Tzong Chen, Jia-Bin Huang1549-1557

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present state-of-the-art instance segmentation results on the challenging long-tail LVIS dataset (Gupta, Doll ar, and Girshick 2019). To validate the effectiveness of this approach, we compare across different architectures and backbones and integrate with additional long-tail resampling methods. We find that Drop Loss demonstrates consistently improved results in AP and AR across all these experimental settings.
Researcher Affiliation Collaboration Ting-I Hsieh1 , Esther Robb2 , Hwann-Tzong Chen1,3, Jia-Bin Huang2 1 National Tsing Hua University 2 Virginia Tech 3 Aeolus Robotics
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Codes are available at https://github.com/timy90022/Drop Loss.
Open Datasets Yes Following the previous work equalization loss (Tan et al. 2020), we train and evaluate our model on LVIS benchmark dataset. LVIS is a large vocabulary instance segmentation dataset, containing 1,230 categories.LVIS (Gupta, Doll ar, and Girshick 2019)
Dataset Splits Yes We train our model on the 57K-image LVIS v0.5 training set and evaluate it on the 5K-image LVIS v0.5 validation set.
Hardware Specification Yes We train the network using stochastic gradient descent with a momentum of 0.9 and a weight decay of 0.0001 for 90K iterations, with batch size 16 on eight parallel NVIDIA 2080 Ti GPUs.
Software Dependencies No The paper mentions using the 'Detectron2 (Wu et al. 2019) framework', but does not specify any version numbers for it or any other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We train the network using stochastic gradient descent with a momentum of 0.9 and a weight decay of 0.0001 for 90K iterations, with batch size 16 on eight parallel NVIDIA 2080 Ti GPUs. We initialize the learning rate to 0.2 and decay it by a ratio of 0.1 at iterations 60,000 and 80,000. We use the Detectron2 (Wu et al. 2019) framework with default data augmentation. The data augmentation includes scale jitter with a short edge of (640, 672, 704, 736, 768, 800) pixels and a long edge no more than 1,333 pixels horizontal flipping. In the Region Proposal Network (RPN), we sample 256 anchors with a 1:1 ratio between foreground and background to compute the RPN loss and choose 512 ROI-aligned proposals per image with a 1:3 foreground-background ratio for later predictions.