Per-Pixel Classification is Not All You Need for Semantic Segmentation

Authors: Bowen Cheng, Alex Schwing, Alexander Kirillov

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Mask Former on five semantic segmentation datasets with various numbers of categories: Cityscapes [13] (19 classes), Mapillary Vistas [31] (65 classes), ADE20K [49] (150 classes), COCOStuff-10K [2] (171 classes), and ADE20K-Full [49] (847 classes). Mask Former achieves the new state-of-the-art on ADE20K (55.6 m Io U) with Swin-Transformer [27] backbone, outperforming a per-pixel classification model [27] with the same backbone by 2.1 m Io U, while being more efficient (10% reduction in parameters and 40% reduction in FLOPs).
Researcher Affiliation Collaboration 1Facebook AI Research (FAIR) 2University of Illinois at Urbana-Champaign (UIUC)
Pseudocode No The paper includes diagrams and descriptions of the model architecture but does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper provides a link to a 'Project page' (https://bowenc0221.github.io/maskformer) but does not explicitly state that source code for the methodology is provided on this page or give a direct link to a code repository.
Open Datasets Yes We study Mask Former using four widely used semantic segmentation datasets: ADE20K [49] (150 classes) from the Scene Parse150 challenge [48], COCO-Stuff-10K [2] (171 classes), Cityscapes [13] (19 classes), and Mapillary Vistas [31] (65 classes). In addition, we use the ADE20K-Full [49] (847 classes) dataset annotated in an open vocabulary setting... For panotic segmenation evaluation we use COCO [26, 2, 22] (80 things and 53 stuff categories) and ADE20K-Panoptic [49, 22] (100 things and 50 stuff categories).
Dataset Splits Yes We evaluate the models on ADE20K val with 150 categories.
Hardware Specification Yes All models are trained with 8 V100 GPUs.
Software Dependencies No The paper mentions using Detectron2 [42] but does not provide specific version numbers for it or any other key software dependencies.
Experiment Setup Yes More specifically, we use Adam W [29] and the poly [6] learning rate schedule with an initial learning rate of 10 4 and a weight decay of 10 4 for Res Net [20] backbones, and an initial learning rate of 6 10 5 and a weight decay of 10 2 for Swin-Transformer [27] backbones. (...) For the ADE20K dataset, if not stated otherwise, we use a crop size of 512 512, a batch size of 16 and train all models for 160k iterations.