P2Seg: Pointly-supervised Segmentation via Mutual Distillation

Authors: Zipeng Wang, Xuehui Yu, Xumeng Han, Wenwen Yu, Zhixun Huang, Jianbin Jiao, Zhenjun Han

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments substantiate the efficacy of MDM in fostering the synergy between instance and semantic information, consequently improving the quality of instance-level object representations. Our method achieves 55.7 m AP50 and 17.6 m AP on the PASCAL VOC and MS COCO datasets, significantly outperforming recent PSIS methods and several box-supervised instance segmentation competitors.
Researcher Affiliation Collaboration Zipeng Wang1 , Xuehui Yu1 , Xumeng Han1, Wenwen Yu1, Zhixun Huang2 Jianbin Jiao1, Zhenjun Han1 1University of Chinese Academy of Sciences 2Xiaomi AI Lab, Beijing, China
Pseudocode No The paper describes its methods in detail through text and figures, but it does not include any explicit pseudocode blocks or algorithms.
Open Source Code Yes Our code is available on https://github.com/ucas-vg/P2Seg-Public.
Open Datasets Yes Our experiments are conducted on PASCAL VOC 2012 (Everingham et al., 2010) and MS COCO 2017 (Lin et al., 2014) datasets.
Dataset Splits Yes VOC dataset contains 20 instance classes, which is usually augmented with the SBD (Hariharan et al., 2011) dataset. It includes 10,582, 1,449, and 1,464 images for training, validation, and testing, respectively. COCO dataset contains 80 classes and includes 118k images for training and 5k images for validation.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions software components like “Adam W optimizer” and “Mask R-CNN” but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes The initial learning rate is set as 5 × 10−5 and the batch size is set as 8. For the experiments on the VOC dataset, we train the network for 50,000 iterations. To ensure the initial pseudo labels are favorable, we warm-up the network for 2,000 iterations. For experiments on the COCO dataset, the total iterations number is 100,000 and the number of warm-up iterations is 5,000. The weights of semantic segmentation loss, offset map loss and instance affinity matrix loss are set as 1.0, 0.01, 1.0, respectively. The data augmentation strategies used here includes random resizing between 0.7 and 1.3, random cropping, random flipping, and photometric distortion. In the fine-tuning phase of the network s instance segmentor, we primarily employ the Mask R-CNN (He et al., 2017a) to retrain the instance segmentation map obtained from the last stage of MDM. We use SGD optimizer with a learning rate of 0.02 and the batch size is 16 with 12 epoches for both VOC and COCO dataset.