AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing
Authors: Qi Song, Kangfu Mei, Rui Huang2567-2575
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have conducted extensive experiments on two semantic segmentation benchmarks, and our network achieves different levels of speed/accuracy trade-offs on Cityscapes, e.g., 71 FPS/79.9% m Io U, 130 FPS/78.5% m Io U, and 180 FPS/70.1% m Io U, and leading performance on ADE20K as well. We conducted extensive experiments on the two most competitive semantic segmentation datasets, i.e., Cityscapes (Cordts et al. 2016) and ADE20K (Zhou et al. 2017). |
| Researcher Affiliation | Academia | 1Shenzhen Institute of Artificial Intelligence and Robotics for Society 2The Chinese University of Hong Kong, Shenzhen 3Jilin University |
| Pseudocode | No | The paper includes architectural diagrams but no explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a link to open-source code or an explicit statement about its release. |
| Open Datasets | Yes | To evaluate the proposed approach, we conducted extensive experiments on the Cityscapes dataset (Cordts et al. 2016) and the ADE20K dataset (Zhou et al. 2017). |
| Dataset Splits | Yes | The 5000 images with fine annotations are further divided into 3 subsets of 2975, 500, and 1525 images for training, validation, and testing, respectively. |
| Hardware Specification | Yes | Specifically, our approach obtains 79.9%, 78.5%, and 70.1% m Io U scores on the Cityscapes test set while keeping a real-time speed of 71 FPS, 130 FPS, and 180 FPS respectively on GTX 1080Ti. |
| Software Dependencies | No | The paper mentions 'Tensor RT for acceleration' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Training Settings. We train the network using standard SGD (Krizhevsky, Sutskever, and Hinton 2012). The minibatch size is set to 16 and 32 for Cityscapes and ADE20K respectively. And we use the momentum of 0.9 and a weight decay of 5e( 4). Similar to other works(Chen et al. 2017; Yu et al. 2018b), we apply the poly learning rate policy in which the initial learning rate is set to 1e( 2) and decayed by (1 iter maxiter)power with power=0.9. The training images are augmented by employing random color jittering, random horizontal flipping, random cropping, and random scaling with 5 scales {0.75, 1.0, 1.5, 1.75, 2.0}. For Cityscapes, images are cropped into size of 1024 1024, and the network is trained with 200k iterations. For ADE20K, crop size of 512 512 and 250K training iterations are used for training. |