How to Save your Annotation Cost for Panoptic Segmentation?

Authors: Xuefeng Du, ChenHan Jiang, Hang Xu, Gengwei Zhang, Zhenguo Li1282-1290

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on COCO benchmark show the superiority of our method, e.g. achieving a segmentation quality of 43.4% compared to 43.0% of OCFusion while saving 2.4x annotation cost.
Researcher Affiliation Collaboration 1 Xi an Jiaotong University 2 Huawei Noah s Ark Lab 3 Sun Yat-Sen University
Pseudocode No The paper describes the proposed methods in detail but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a statement about open-sourcing the code or a link to a repository.
Open Datasets Yes We conduct experiments on MS-COCO 2017 (Lin et al. 2014).
Dataset Splits Yes It has 80 thing categories and 53 stuff categories, which is divided into train set (118K images), val set (5K images) and test set (20K unannotated images).
Hardware Specification Yes With the Res Net-50 as backbone, the inference time (on V100) for CQB-Net is 283ms/image while that of Panoptic FPN (3 cascade stages) is 256ms.
Software Dependencies No We implement our model using MMDetection (Chen et al. 2019) and train with 8 GPUs. The paper mentions MMDetection but does not specify a version number, nor does it list specific versions for other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We train for 24 epochs with a batch size of 16, weight decay of 1e-4, learning rate of 0.02 with step decay at epoch 18 and 23 by 0.1. We use SGD optimizer with momentum of 0.9. We use multi-scale training between 1333 400 and 1333 900 pixels with random flipping. The test image scale is 1333 800. We use 2 inter-graph reasoning layers and the dimension of f, fen th and fen st are 128. we average the extra box classification and semantic segmentation outputs in the relation reasoning module with those from the base panoptic network as in Fig.3. All the other hyperparameters are kept the same as the original papers. We run 50 ablative experiments for each supervision to approximate Eqn.8.