Per-Pixel Classification is Not All You Need for Semantic Segmentation
Authors: Bowen Cheng, Alex Schwing, Alexander Kirillov
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Mask Former on five semantic segmentation datasets with various numbers of categories: Cityscapes [13] (19 classes), Mapillary Vistas [31] (65 classes), ADE20K [49] (150 classes), COCOStuff-10K [2] (171 classes), and ADE20K-Full [49] (847 classes). Mask Former achieves the new state-of-the-art on ADE20K (55.6 m Io U) with Swin-Transformer [27] backbone, outperforming a per-pixel classification model [27] with the same backbone by 2.1 m Io U, while being more efficient (10% reduction in parameters and 40% reduction in FLOPs). |
| Researcher Affiliation | Collaboration | 1Facebook AI Research (FAIR) 2University of Illinois at Urbana-Champaign (UIUC) |
| Pseudocode | No | The paper includes diagrams and descriptions of the model architecture but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a link to a 'Project page' (https://bowenc0221.github.io/maskformer) but does not explicitly state that source code for the methodology is provided on this page or give a direct link to a code repository. |
| Open Datasets | Yes | We study Mask Former using four widely used semantic segmentation datasets: ADE20K [49] (150 classes) from the Scene Parse150 challenge [48], COCO-Stuff-10K [2] (171 classes), Cityscapes [13] (19 classes), and Mapillary Vistas [31] (65 classes). In addition, we use the ADE20K-Full [49] (847 classes) dataset annotated in an open vocabulary setting... For panotic segmenation evaluation we use COCO [26, 2, 22] (80 things and 53 stuff categories) and ADE20K-Panoptic [49, 22] (100 things and 50 stuff categories). |
| Dataset Splits | Yes | We evaluate the models on ADE20K val with 150 categories. |
| Hardware Specification | Yes | All models are trained with 8 V100 GPUs. |
| Software Dependencies | No | The paper mentions using Detectron2 [42] but does not provide specific version numbers for it or any other key software dependencies. |
| Experiment Setup | Yes | More specifically, we use Adam W [29] and the poly [6] learning rate schedule with an initial learning rate of 10 4 and a weight decay of 10 4 for Res Net [20] backbones, and an initial learning rate of 6 10 5 and a weight decay of 10 2 for Swin-Transformer [27] backbones. (...) For the ADE20K dataset, if not stated otherwise, we use a crop size of 512 512, a batch size of 16 and train all models for 160k iterations. |