K-Net: Towards Unified Image Segmentation
Authors: Wenwei Zhang, Jiangmiao Pang, Kai Chen, Chen Change Loy
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | K-Net surpasses all previous published stateof-the-art single-model results of panoptic segmentation on MS COCO test-dev split and semantic segmentation on ADE20K val split with 55.2% PQ and 54.3% m Io U, respectively. Its instance segmentation performance is also on par with Cascade Mask R-CNN on MS COCO with 60%-90% faster inference speeds. Code and models will be released at https://github.com/Zww Wayne/K-Net/. To show the effectiveness of the proposed unified framework on different segmentation tasks, we conduct extensive experiments on COCO dataset [38] for panoptic and instance segmentation, and ADE20K dataset [70] for semantic segmentation. |
| Researcher Affiliation | Collaboration | 1S-Lab, Nanyang Technological University 2CUHK-Sense Time Joint Lab, the Chinese University of Hong Kong 3Sense Time Research 4Shanghai AI Laboratory {wenwei001, ccloy}@ntu.edu.sg pangjiangmiao@gmail.com chenkai@sensetime.com |
| Pseudocode | No | The paper provides architectural diagrams (Figure 2, Figure 3) and describes the steps of the method in text, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code and models will be released at https://github.com/Zww Wayne/K-Net/. |
| Open Datasets | Yes | we conduct extensive experiments on COCO dataset [38] for panoptic and instance segmentation, and ADE20K dataset [70] for semantic segmentation. |
| Dataset Splits | Yes | All models are trained on the train2017 split and evaluated on the val2017 split. ... All models are trained on the train split and evaluated on the validation split. |
| Hardware Specification | No | The paper mentions training on '16 GPUs' and '44 GPU days' but does not specify the type or model of the GPUs or other hardware components. |
| Software Dependencies | No | For panoptic and instance segmentation, we implement K-Net with MMDetection [6]. ... For semantic segmentation, we implement K-Net with MMSegmentation [13]. The paper mentions software frameworks but does not provide specific version numbers for these or other libraries/dependencies. |
| Experiment Setup | Yes | In the ablation study, the model is trained with a batch size of 16 for 12 epochs. The learning rate is 0.0001, and it is decreased by 0.1 after 8 and 11 epochs, respectively. We use Adam W [41] with a weight decay of 0.05. For data augmentation in training, we adopt horizontal flip augmentation with a single scale. The long edge and short edge of images are resized to 1333 and 800, respectively, without changing the aspect ratio. When comparing with other frameworks, we use multi-scale training with a longer schedule (36 epochs) for fair comparisons [6]. The short edge of images is randomly sampled from [640, 800] [21]. For semantic segmentation, we implement K-Net with MMSegmentation [13] and train it with 80,000 iterations. As Adam W [41] empirically works better than SGD, we use Adam W with a weight decay of 0.0005 by default on both the baselines and K-Net for a fair comparison. The initial learning rate is 0.0001, and it is decayed by 0.1 after 60000 and 72000 iterations, respectively. |