Self-Erasing Network for Integral Object Attention

Authors: Qibin Hou, PengTao Jiang, Yunchao Wei, Ming-Ming Cheng

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Pascal VOC well demonstrate the superiority of our See Net over other state-of-the-art methods. To test the quality of our proposed attention network, we applied the generated attention maps to the recently popular weakly-supervised semantic segmentation task. We evaluate our approach on the PASCAL VOC 2012 image segmentation benchmark [6]. We compare our results with other works on both the validation and test sets. To show the importance of our self-erasing strategies, we perform several ablation experiments in this subsection. Quantitative results on PASCAL VOC 2012.
Researcher Affiliation Academia Qibin Hou Peng-Tao Jiang Colledge of Computer Science, Nankai University andrewhoux@gmail.com Yunchao Wei UIUC Urbana-Champaign, IL, USA Ming-Ming Cheng Colledge of Computer Science, Nankai University cmm@nankai.edu.cn
Pseudocode Yes Algorithm 1: Proxy ground-truth for training semantic segmentation networks
Open Source Code No Project page: http://mmcheng.net/See Net/. This is a project page, not an explicit statement of code release or a direct link to a code repository for the methodology.
Open Datasets Yes We evaluate our approach on the PASCAL VOC 2012 image segmentation benchmark [6], which contains 20 semantic classes plus the background category. As done in most previous works, we train our model for both the attention and segmentation tasks on the training set, which consists of 10,582 images, including the augmented training set provided by [7].
Dataset Splits Yes We train our model for both the attention and segmentation tasks on the training set, which consists of 10,582 images, including the augmented training set provided by [7]. We compare our results with other works on both the validation and test sets, which have 1,449 and 1,456 images, respectively.
Hardware Specification No The paper states it uses VGGNet and Deeplab-Large FOV architecture but does not specify any hardware details such as GPU model, CPU, or memory.
Software Dependencies No The paper mentions using VGGNet and Deeplab-Large FOV architecture, and references libraries like CRFs, but does not specify version numbers for any software dependencies.
Experiment Setup Yes We set the batch size to 16, weight decay 0.0002, and learning rate 0.001, divided by 10 after 15,000 iterations. We run our network for totally 25,000 iterations. For data augmentation, we follow the strategy used in [8]. Thresholds δh and δl in SB are set to 0.7 and 0.05 times of the maximum value of the attention map inputted to C-Re LU layer, respectively. For the threshold used in SC, the factor is set to (δh + δl)/2.