K-Net: Towards Unified Image Segmentation

Authors: Wenwei Zhang, Jiangmiao Pang, Kai Chen, Chen Change Loy

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental K-Net surpasses all previous published stateof-the-art single-model results of panoptic segmentation on MS COCO test-dev split and semantic segmentation on ADE20K val split with 55.2% PQ and 54.3% m Io U, respectively. Its instance segmentation performance is also on par with Cascade Mask R-CNN on MS COCO with 60%-90% faster inference speeds. Code and models will be released at https://github.com/Zww Wayne/K-Net/. To show the effectiveness of the proposed unified framework on different segmentation tasks, we conduct extensive experiments on COCO dataset [38] for panoptic and instance segmentation, and ADE20K dataset [70] for semantic segmentation.
Researcher Affiliation Collaboration 1S-Lab, Nanyang Technological University 2CUHK-Sense Time Joint Lab, the Chinese University of Hong Kong 3Sense Time Research 4Shanghai AI Laboratory {wenwei001, ccloy}@ntu.edu.sg pangjiangmiao@gmail.com chenkai@sensetime.com
Pseudocode No The paper provides architectural diagrams (Figure 2, Figure 3) and describes the steps of the method in text, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code and models will be released at https://github.com/Zww Wayne/K-Net/.
Open Datasets Yes we conduct extensive experiments on COCO dataset [38] for panoptic and instance segmentation, and ADE20K dataset [70] for semantic segmentation.
Dataset Splits Yes All models are trained on the train2017 split and evaluated on the val2017 split. ... All models are trained on the train split and evaluated on the validation split.
Hardware Specification No The paper mentions training on '16 GPUs' and '44 GPU days' but does not specify the type or model of the GPUs or other hardware components.
Software Dependencies No For panoptic and instance segmentation, we implement K-Net with MMDetection [6]. ... For semantic segmentation, we implement K-Net with MMSegmentation [13]. The paper mentions software frameworks but does not provide specific version numbers for these or other libraries/dependencies.
Experiment Setup Yes In the ablation study, the model is trained with a batch size of 16 for 12 epochs. The learning rate is 0.0001, and it is decreased by 0.1 after 8 and 11 epochs, respectively. We use Adam W [41] with a weight decay of 0.05. For data augmentation in training, we adopt horizontal flip augmentation with a single scale. The long edge and short edge of images are resized to 1333 and 800, respectively, without changing the aspect ratio. When comparing with other frameworks, we use multi-scale training with a longer schedule (36 epochs) for fair comparisons [6]. The short edge of images is randomly sampled from [640, 800] [21]. For semantic segmentation, we implement K-Net with MMSegmentation [13] and train it with 80,000 iterations. As Adam W [41] empirically works better than SGD, we use Adam W with a weight decay of 0.0005 by default on both the baselines and K-Net for a fair comparison. The initial learning rate is 0.0001, and it is decayed by 0.1 after 60000 and 72000 iterations, respectively.