SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation

Authors: Meng-Hao Guo, Cheng-Ze Lu, Qibin Hou, Zhengning Liu, Ming-Ming Cheng, Shi-min Hu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our Seg Ne Xt significantly improves the performance of previous state-of-the-art methods on popular benchmarks, including ADE20K, Cityscapes, COCO-Stuff, Pascal VOC, Pascal Context, and i SAID. Notably, Seg Ne Xt outperforms Efficient Net-L2 w/ NAS-FPN and achieves 90.6% m Io U on the Pascal VOC 2012 test leaderboard using only 1/10 parameters of it. On average, Seg Ne Xt achieves about 2.0% m Io U improvements compared to the state-of-the-art methods on the ADE20K datasets with the same or fewer computations.
Researcher Affiliation Collaboration Meng-Hao Guo1 Cheng-Ze Lu2 Qibin Hou2 Zheng-Ning Liu3 Ming-Ming Cheng2 Shi-Min Hu1 1BNRist, Department of Computer Science and Technology, Tsinghua University 2TMCC, CS, Nankai University 3Fitten Tech, Beijing, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Project page: https://github.com/Jittor/JSeg
Open Datasets Yes Dataset. We evaluate our methods on seven popular datasets, including Image Net-1K [14], ADE20K [111], Cityscapes [12], Pascal VOC [17], Pascal Context [65], COCO-Stuff [3], and i SAID [84].
Dataset Splits Yes ADE20K [111] is a challenging dataset which contains 150 semantic classes. It consists of 20,210/2,000/3,352 images in the training, validation and test sets. Cityscapes [12] mainly focuses on urban scenes and contains 5.000 high-resolution images with 19 categories. There are 2,975/500/1,525 images for training, validation and testing, respectively.
Hardware Specification Yes All models are trained on a node with 8 RTX 3090 GPUs. We test our method with a single RTX-3090 GPU and AMD EPYC 7543 32-core processor CPU.
Software Dependencies No The paper mentions software like Jittor [32], Pytorch [68], timm [85], and mmsegmentation [11] libraries, but it does not specify their version numbers.
Experiment Setup Yes We adopt some common data augmentation including random horizontal flipping, random scaling (from 0.5 to 2) and random cropping. The batch size is set to 8 for the Cityscapes dataset and 16 for all the other datasets. Adam W [61] is applied to train our models. We set the initial learning rate as 0.00006 and employ the poly-learning rate decay policy. We train our model 160K iterations for ADE20K, Cityscapes and i SAID datasets and 80K iterations for COCO-Stuff, Pascal VOC and Pascal Context datasets.