Towards Real-Time Segmentation on the Edge
Authors: Yanyu Li, Changdi Yang, Pu Zhao, Geng Yuan, Wei Niu, Jiexiong Guan, Hao Tang, Minghai Qin, Qing Jin, Bin Ren, Xue Lin, Yanzhi Wang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments Datasets and Metrics Cityscapes. Cityscapes(Cordts, Omran et al. 2016, 2015) is a dataset of urban street scenes from the perspective of cars collected in 50 cities. It includes 5000 finely annotated image, in which 2,975 images are used for training, 500 for validation, and 1,525 for testing. We exclude coarse training data. This dataset has 30 label classes and 19 of them are used for segmentation. The resolution of images are 2048 1024. and Experimental Results and Analysis Based on our latency-driven search algorithm, we search on the proposed dual branch backbone with mixed operators. |
| Researcher Affiliation | Academia | 1Northeastern University 2College of William & Mary 3CVL, ETH Zurich |
| Pseudocode | No | The paper uses mathematical equations and block diagrams to illustrate components and processes, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | Cityscapes. Cityscapes(Cordts, Omran et al. 2016, 2015) is a dataset of urban street scenes from the perspective of cars collected in 50 cities. and PASCAL Visual Object Classes (VOC) 2012(Everingham et al. 2010) is a widely used dataset for semantic segmentation, classification, and object detection tasks. and ADE20K (Zhou et al. 2017, 2019) is a finelyannotated image dataset for object segmentations and part segmentations. |
| Dataset Splits | Yes | For Cityscapes, in which 2,975 images are used for training, 500 for validation, and 1,525 for testing. and For PASCAL VOC 2012, there are 1464 images for training and 1449 images for validation. |
| Hardware Specification | Yes | We search and train the neural network on 8 NVIDIA RTX TITAN GPUs, with CUDA 11.1 and Py Torch 1.9. and Mobile latency is measured on the GPU of an Samsung Galaxy S21 smartphone, with Qualcomm Snapdragon 888 mobile platform integrated with Qualcomm Kryo 680 Octa-core CPU and a Qualcomm Adreno 660 GPU. |
| Software Dependencies | Yes | We search and train the neural network on 8 NVIDIA RTX TITAN GPUs, with CUDA 11.1 and Py Torch 1.9. |
| Experiment Setup | Yes | We use stochastic gradient descent (SGD) optimizer and momentum is set to 0.9, and set batch size to 8 on each GPU. For Cityscapes, the learning rate is set to 0.1 initially with poly policy. For PASCAL VOC 2012, we set initial learning rate as 0.01. Learning rate value is determined as 1 iter total iter 0.9 where iter refers to the current iteration number. The pretraining of supernet takes 160k iterations, while the search and fine-tune process both take 40k iterations. We incorporate multiple random scaling {0.5, 0.75, 1.0, 2.0} and fixed size cropping of 512 1024 as data augmentation For Cityscapes. The crop size is chosen based on the trade-off between mobile capacity and accuracy. To enhance the training, we also use color jitter and random horizontal flip. As for PASCAL VOC 2012, we randomly crop the input image to 513 513. We set hyperparameters β as 0.01, γ as 1.0 and λ to be 0.001 in all experiments. |