Expansion and Shrinkage of Localization for Weakly-Supervised Semantic Segmentation
Authors: Jinlong Li, Zequn Jie, Xu Wang, Xiaolin Wei, Lin Ma
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct various experiments on PASCAL VOC 2012 and MS COCO 2014 to well demonstrate the superiority of our method over other state-of-the-art methods for Weakly-Supervised Semantic Segmentation. The code is available at https://github.com/Tyrone Li/ESOL_WSSS. |
| Researcher Affiliation | Collaboration | Jinlong Li1,2, Zequn Jie2, Xu Wang1, Xiaolin Wei2 Lin Ma2, 1 College of Computer Science and Software Engineering, Shenzhen University, China 2 Meituan Inc. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. The pipeline is described through text and diagrams. |
| Open Source Code | Yes | The code is available at https://github.com/Tyrone Li/ESOL_WSSS. |
| Open Datasets | Yes | Experiments are conducted on two publicly available datasets, PASCAL VOC 2012 [14] and MS COCO 2014 [37]. The Pascal VOC 2012 dataset contains 20 foreground categories and the background. It has three sets, the training, validation, and test set, each containing 1464, 1449 and 1456 images, respectively. Following most previous works [5, 30, 51, 53, 62], we also adopt the augmented training set [17] to yield totally 10582 training images. The MS COCO 2014 dataset has 80 foreground categories, including approximately 82K training images and 4K validation images. |
| Dataset Splits | Yes | The Pascal VOC 2012 dataset... has three sets, the training, validation, and test set, each containing 1464, 1449 and 1456 images, respectively. Following most previous works [5, 30, 51, 53, 62], we also adopt the augmented training set [17] to yield totally 10582 training images. The MS COCO 2014 dataset has 80 foreground categories, including approximately 82K training images and 4K validation images. We evaluate our method on 1449 validation images and 1456 test images from the PASCAL VOC 2012 dataset and on 40504 validation images from the MS COCO 2014 datasets. |
| Hardware Specification | Yes | We implement CAM [65] by following the procedure from Ahn et al. [1], implemented with the Py Torch framework [43] on 12G Nvidia XP Graphic Cards. |
| Software Dependencies | No | The paper mentions 'Py Torch framework [43]' but does not specify a version number for PyTorch or any other software libraries or dependencies. Therefore, it does not provide a reproducible description of ancillary software with specific version numbers. |
| Experiment Setup | Yes | We adopt the Res Net-50 [18] as backbone for the classification model. For Expansion stage, we train the network 6610 iterations for PASCAL VOC 2012 and 51730 iterations for MS COCO 2014. To carefully train the model with a loss maximization, we set a relatively small learning rate, 0.01 and 0.001 for PASCAL VOC 2012 and MS COCO 2014, while the controlling parameter α is set to 0.001. For Shrinkage stage, the network is initialized from the Expansion model weights. The learning rate is set to be 0.1 and 0.02, and the training iteration is set to be 6610 and 51730 for PASCAL VOC 2012 and MS COCO 2014, respectively. The threshold value of the hand-craft feature clipping strategy is 0.15. The γ and µ are both set to be 1.0. To generate reliable initial localization maps, the scale ratio of multi-scale CAM is {0.5, 1.0, 1.5, 2.0}. |