SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation
Authors: Meng-Hao Guo, Cheng-Ze Lu, Qibin Hou, Zhengning Liu, Ming-Ming Cheng, Shi-min Hu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our Seg Ne Xt significantly improves the performance of previous state-of-the-art methods on popular benchmarks, including ADE20K, Cityscapes, COCO-Stuff, Pascal VOC, Pascal Context, and i SAID. Notably, Seg Ne Xt outperforms Efficient Net-L2 w/ NAS-FPN and achieves 90.6% m Io U on the Pascal VOC 2012 test leaderboard using only 1/10 parameters of it. On average, Seg Ne Xt achieves about 2.0% m Io U improvements compared to the state-of-the-art methods on the ADE20K datasets with the same or fewer computations. |
| Researcher Affiliation | Collaboration | Meng-Hao Guo1 Cheng-Ze Lu2 Qibin Hou2 Zheng-Ning Liu3 Ming-Ming Cheng2 Shi-Min Hu1 1BNRist, Department of Computer Science and Technology, Tsinghua University 2TMCC, CS, Nankai University 3Fitten Tech, Beijing, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project page: https://github.com/Jittor/JSeg |
| Open Datasets | Yes | Dataset. We evaluate our methods on seven popular datasets, including Image Net-1K [14], ADE20K [111], Cityscapes [12], Pascal VOC [17], Pascal Context [65], COCO-Stuff [3], and i SAID [84]. |
| Dataset Splits | Yes | ADE20K [111] is a challenging dataset which contains 150 semantic classes. It consists of 20,210/2,000/3,352 images in the training, validation and test sets. Cityscapes [12] mainly focuses on urban scenes and contains 5.000 high-resolution images with 19 categories. There are 2,975/500/1,525 images for training, validation and testing, respectively. |
| Hardware Specification | Yes | All models are trained on a node with 8 RTX 3090 GPUs. We test our method with a single RTX-3090 GPU and AMD EPYC 7543 32-core processor CPU. |
| Software Dependencies | No | The paper mentions software like Jittor [32], Pytorch [68], timm [85], and mmsegmentation [11] libraries, but it does not specify their version numbers. |
| Experiment Setup | Yes | We adopt some common data augmentation including random horizontal flipping, random scaling (from 0.5 to 2) and random cropping. The batch size is set to 8 for the Cityscapes dataset and 16 for all the other datasets. Adam W [61] is applied to train our models. We set the initial learning rate as 0.00006 and employ the poly-learning rate decay policy. We train our model 160K iterations for ADE20K, Cityscapes and i SAID datasets and 80K iterations for COCO-Stuff, Pascal VOC and Pascal Context datasets. |