Towards Real-Time Segmentation on the Edge

Authors: Yanyu Li, Changdi Yang, Pu Zhao, Geng Yuan, Wei Niu, Jiexiong Guan, Hao Tang, Minghai Qin, Qing Jin, Bin Ren, Xue Lin, Yanzhi Wang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments Datasets and Metrics Cityscapes. Cityscapes(Cordts, Omran et al. 2016, 2015) is a dataset of urban street scenes from the perspective of cars collected in 50 cities. It includes 5000 finely annotated image, in which 2,975 images are used for training, 500 for validation, and 1,525 for testing. We exclude coarse training data. This dataset has 30 label classes and 19 of them are used for segmentation. The resolution of images are 2048 1024. and Experimental Results and Analysis Based on our latency-driven search algorithm, we search on the proposed dual branch backbone with mixed operators.
Researcher Affiliation Academia 1Northeastern University 2College of William & Mary 3CVL, ETH Zurich
Pseudocode No The paper uses mathematical equations and block diagrams to illustrate components and processes, but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes Cityscapes. Cityscapes(Cordts, Omran et al. 2016, 2015) is a dataset of urban street scenes from the perspective of cars collected in 50 cities. and PASCAL Visual Object Classes (VOC) 2012(Everingham et al. 2010) is a widely used dataset for semantic segmentation, classification, and object detection tasks. and ADE20K (Zhou et al. 2017, 2019) is a finelyannotated image dataset for object segmentations and part segmentations.
Dataset Splits Yes For Cityscapes, in which 2,975 images are used for training, 500 for validation, and 1,525 for testing. and For PASCAL VOC 2012, there are 1464 images for training and 1449 images for validation.
Hardware Specification Yes We search and train the neural network on 8 NVIDIA RTX TITAN GPUs, with CUDA 11.1 and Py Torch 1.9. and Mobile latency is measured on the GPU of an Samsung Galaxy S21 smartphone, with Qualcomm Snapdragon 888 mobile platform integrated with Qualcomm Kryo 680 Octa-core CPU and a Qualcomm Adreno 660 GPU.
Software Dependencies Yes We search and train the neural network on 8 NVIDIA RTX TITAN GPUs, with CUDA 11.1 and Py Torch 1.9.
Experiment Setup Yes We use stochastic gradient descent (SGD) optimizer and momentum is set to 0.9, and set batch size to 8 on each GPU. For Cityscapes, the learning rate is set to 0.1 initially with poly policy. For PASCAL VOC 2012, we set initial learning rate as 0.01. Learning rate value is determined as 1 iter total iter 0.9 where iter refers to the current iteration number. The pretraining of supernet takes 160k iterations, while the search and fine-tune process both take 40k iterations. We incorporate multiple random scaling {0.5, 0.75, 1.0, 2.0} and fixed size cropping of 512 1024 as data augmentation For Cityscapes. The crop size is chosen based on the trade-off between mobile capacity and accuracy. To enhance the training, we also use color jitter and random horizontal flip. As for PASCAL VOC 2012, we randomly crop the input image to 513 513. We set hyperparameters β as 0.01, γ as 1.0 and λ to be 0.001 in all experiments.