RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer
Authors: Jian Wang, Chenhui Gou, Qiman Wu, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on mainstream benchmarks demonstrate the effectiveness of our proposed RTFormer, it achieves state-of-the-art on Cityscapes, Cam Vid and COCOStuff, and shows promising results on ADE20K. Code is available at Paddle Seg[24]: https://github.com/PaddlePaddle/PaddleSeg. In this section, we valid RTFormer on Cityscapes[10], Camvid[4], ADE20K[52] and COCOStuff[5]. We first introduce the datasets with their training details. Then, we compare RTFormer with state-of-the-art real-time methods on Cityscapes and Cam Vid. Besides, more experiments on ADE20K[52] and COCOStuff[5] are summarised to further prove the generality of our method. Finally, ablation studies of different design modules within RTFormer block on ADE20K[52] are provided. |
| Researcher Affiliation | Collaboration | Jian Wang1 Chenhui Gou2 Qiman Wu1 Haocheng Feng1 Junyu Han1 Errui Ding1 Jingdong Wang1 1Baidu VIS 2Australian National University(ANU) {wangjian33, wuqiman, fenghaocheng, hanjunyu, dingerrui, wangjingdong}@baidu.com u7194588@anu.edu.au |
| Pseudocode | No | The paper includes figures illustrating the architecture and attention mechanisms but does not contain any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at Paddle Seg[24]: https://github.com/PaddlePaddle/PaddleSeg. |
| Open Datasets | Yes | In this section, we valid RTFormer on Cityscapes[10], Camvid[4], ADE20K[52] and COCOStuff[5]. All models are pretrained on Image Net[11]. |
| Dataset Splits | Yes | Cityscapes[10]... contains 2975, 500 and 1525 fine annotated images for training, validation, and testing respectively. Cam Vid[4]... divided into 367 training images, 101 validation images, and 233 testing images. ADE20K[52]... split 20K, 2K, and 3K images for training, validation, and testing, respectively. COCOStuff[5]... contains 10K images (9K for training and 1K for testing). |
| Hardware Specification | Yes | the FPS is measured on RTX 2080Ti without tensorrt acceleration by default. All models are trained with 484 epochs (about 120K iterations), a batch size of 12, and sync BN on four V100 GPUs. FPS is calculateted under the same input scale as performance measuring. In this table, * means we retrain this method follows its original training setting, and means we measure the FPS on single RTX 2080Ti GPU. |
| Software Dependencies | No | The paper mentions 'Paddle Seg[24]' as a toolkit but does not specify version numbers for any key software components or libraries (e.g., Python, PyTorch, CUDA versions) used for the implementation or experiments. |
| Experiment Setup | Yes | We train all models using the Adam W optimizer with the initial learning rate 0.0004 and the weight decay of 0.0125. We adopt the poly learning policy with the power of 0.9 to drop the learning rate and implement the data augmentation method including random cropping into 512 1024, random scaling in the range of 0.5 to 2.0, and random horizontal flipping. All models are trained with 484 epochs (about 120K iterations), a batch size of 12, and sync BN on four V100 GPUs. |