Self-Erasing Network for Integral Object Attention
Authors: Qibin Hou, PengTao Jiang, Yunchao Wei, Ming-Ming Cheng
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Pascal VOC well demonstrate the superiority of our See Net over other state-of-the-art methods. To test the quality of our proposed attention network, we applied the generated attention maps to the recently popular weakly-supervised semantic segmentation task. We evaluate our approach on the PASCAL VOC 2012 image segmentation benchmark [6]. We compare our results with other works on both the validation and test sets. To show the importance of our self-erasing strategies, we perform several ablation experiments in this subsection. Quantitative results on PASCAL VOC 2012. |
| Researcher Affiliation | Academia | Qibin Hou Peng-Tao Jiang Colledge of Computer Science, Nankai University andrewhoux@gmail.com Yunchao Wei UIUC Urbana-Champaign, IL, USA Ming-Ming Cheng Colledge of Computer Science, Nankai University cmm@nankai.edu.cn |
| Pseudocode | Yes | Algorithm 1: Proxy ground-truth for training semantic segmentation networks |
| Open Source Code | No | Project page: http://mmcheng.net/See Net/. This is a project page, not an explicit statement of code release or a direct link to a code repository for the methodology. |
| Open Datasets | Yes | We evaluate our approach on the PASCAL VOC 2012 image segmentation benchmark [6], which contains 20 semantic classes plus the background category. As done in most previous works, we train our model for both the attention and segmentation tasks on the training set, which consists of 10,582 images, including the augmented training set provided by [7]. |
| Dataset Splits | Yes | We train our model for both the attention and segmentation tasks on the training set, which consists of 10,582 images, including the augmented training set provided by [7]. We compare our results with other works on both the validation and test sets, which have 1,449 and 1,456 images, respectively. |
| Hardware Specification | No | The paper states it uses VGGNet and Deeplab-Large FOV architecture but does not specify any hardware details such as GPU model, CPU, or memory. |
| Software Dependencies | No | The paper mentions using VGGNet and Deeplab-Large FOV architecture, and references libraries like CRFs, but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | We set the batch size to 16, weight decay 0.0002, and learning rate 0.001, divided by 10 after 15,000 iterations. We run our network for totally 25,000 iterations. For data augmentation, we follow the strategy used in [8]. Thresholds δh and δl in SB are set to 0.7 and 0.05 times of the maximum value of the attention map inputted to C-Re LU layer, respectively. For the threshold used in SC, the factor is set to (δh + δl)/2. |