DropLoss for Long-Tail Instance Segmentation
Authors: Ting-I Hsieh, Esther Robb, Hwann-Tzong Chen, Jia-Bin Huang1549-1557
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present state-of-the-art instance segmentation results on the challenging long-tail LVIS dataset (Gupta, Doll ar, and Girshick 2019). To validate the effectiveness of this approach, we compare across different architectures and backbones and integrate with additional long-tail resampling methods. We find that Drop Loss demonstrates consistently improved results in AP and AR across all these experimental settings. |
| Researcher Affiliation | Collaboration | Ting-I Hsieh1 , Esther Robb2 , Hwann-Tzong Chen1,3, Jia-Bin Huang2 1 National Tsing Hua University 2 Virginia Tech 3 Aeolus Robotics |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Codes are available at https://github.com/timy90022/Drop Loss. |
| Open Datasets | Yes | Following the previous work equalization loss (Tan et al. 2020), we train and evaluate our model on LVIS benchmark dataset. LVIS is a large vocabulary instance segmentation dataset, containing 1,230 categories.LVIS (Gupta, Doll ar, and Girshick 2019) |
| Dataset Splits | Yes | We train our model on the 57K-image LVIS v0.5 training set and evaluate it on the 5K-image LVIS v0.5 validation set. |
| Hardware Specification | Yes | We train the network using stochastic gradient descent with a momentum of 0.9 and a weight decay of 0.0001 for 90K iterations, with batch size 16 on eight parallel NVIDIA 2080 Ti GPUs. |
| Software Dependencies | No | The paper mentions using the 'Detectron2 (Wu et al. 2019) framework', but does not specify any version numbers for it or any other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We train the network using stochastic gradient descent with a momentum of 0.9 and a weight decay of 0.0001 for 90K iterations, with batch size 16 on eight parallel NVIDIA 2080 Ti GPUs. We initialize the learning rate to 0.2 and decay it by a ratio of 0.1 at iterations 60,000 and 80,000. We use the Detectron2 (Wu et al. 2019) framework with default data augmentation. The data augmentation includes scale jitter with a short edge of (640, 672, 704, 736, 768, 800) pixels and a long edge no more than 1,333 pixels horizontal flipping. In the Region Proposal Network (RPN), we sample 256 anchors with a 1:1 ratio between foreground and background to compute the RPN loss and choose 512 ROI-aligned proposals per image with a 1:3 foreground-background ratio for later predictions. |