AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing

Authors: Qi Song, Kangfu Mei, Rui Huang2567-2575

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We have conducted extensive experiments on two semantic segmentation benchmarks, and our network achieves different levels of speed/accuracy trade-offs on Cityscapes, e.g., 71 FPS/79.9% m Io U, 130 FPS/78.5% m Io U, and 180 FPS/70.1% m Io U, and leading performance on ADE20K as well. We conducted extensive experiments on the two most competitive semantic segmentation datasets, i.e., Cityscapes (Cordts et al. 2016) and ADE20K (Zhou et al. 2017).
Researcher Affiliation Academia 1Shenzhen Institute of Artificial Intelligence and Robotics for Society 2The Chinese University of Hong Kong, Shenzhen 3Jilin University
Pseudocode No The paper includes architectural diagrams but no explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a link to open-source code or an explicit statement about its release.
Open Datasets Yes To evaluate the proposed approach, we conducted extensive experiments on the Cityscapes dataset (Cordts et al. 2016) and the ADE20K dataset (Zhou et al. 2017).
Dataset Splits Yes The 5000 images with fine annotations are further divided into 3 subsets of 2975, 500, and 1525 images for training, validation, and testing, respectively.
Hardware Specification Yes Specifically, our approach obtains 79.9%, 78.5%, and 70.1% m Io U scores on the Cityscapes test set while keeping a real-time speed of 71 FPS, 130 FPS, and 180 FPS respectively on GTX 1080Ti.
Software Dependencies No The paper mentions 'Tensor RT for acceleration' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Training Settings. We train the network using standard SGD (Krizhevsky, Sutskever, and Hinton 2012). The minibatch size is set to 16 and 32 for Cityscapes and ADE20K respectively. And we use the momentum of 0.9 and a weight decay of 5e( 4). Similar to other works(Chen et al. 2017; Yu et al. 2018b), we apply the poly learning rate policy in which the initial learning rate is set to 1e( 2) and decayed by (1 iter maxiter)power with power=0.9. The training images are augmented by employing random color jittering, random horizontal flipping, random cropping, and random scaling with 5 scales {0.75, 1.0, 1.5, 1.75, 2.0}. For Cityscapes, images are cropped into size of 1024 1024, and the network is trained with 200k iterations. For ADE20K, crop size of 512 512 and 250K training iterations are used for training.