Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation

Authors: Xueyi Li, Tianfei Zhou, Jianwu Li, Yi Zhou, Zhaoxiang Zhang1984-1992

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on the popular PASCAL VOC 2012 and COCO benchmarks, and our model yields state-of-the-art performance.
Researcher Affiliation Academia 1 Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, China 2 Computer Vision Laboratory, ETH Zurich, Switzerland 3 School of Computer Science and Engineering, Southeast University, China 4 Center for Research on Intelligent Perception and Computing, CASIA, China xueyili@bit.edu.cn tianfei.zhou@vision.ee.ethz.ch ljw@bit.edu.cn
Pseudocode No The paper describes the methods using mathematical equations and textual explanations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our code is available at: https://github.com/Lixy1997/Group-WSSS.
Open Datasets Yes We conduct our experiments on two datasets: PASCAL VOC 2012 (Everingham et al. 2010) and COCO (Lin et al. 2014). Following standard protocol (Huang et al. 2018; Lee et al. 2019; Wang et al. 2020b), extra data from SBD (Hariharan et al. 2011) is also used for training, leading to a total of 10,582 training images.
Dataset Splits Yes We evaluate our model on the standard validation and test sets, which have 1,449 and 1,456 images, respectively. Following (Wang et al. 2020a), we use the default train/val splits (80k images for training and 40k for validation) in the experiment.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models, or memory specifications.
Software Dependencies No The paper mentions software components like VGG16, ResNet101, DeepLab-v2, and SGD optimizer, but it does not specify any version numbers for these or other software dependencies.
Experiment Setup Yes For the classification network, the number of nodes K and message passing steps T in the GNN are separately set to 4 and 3 by default. The input image size is 224 × 224. The entire network is trained using the SGD optimizer with initial learning rates of 1e-3 for the backbone and 1e-2 for the GNN, which are reduced by 0.1 every five epochs. The total number of epochs, momentum and weight decay are set to 15, 0.9, and 5e-4, respectively. The λ in Eq. (12) is empirically set to 0.4 and the d in Eq. (5) is set to 4.