SM-NAS: Structural-to-Modular Neural Architecture Search for Object Detection
Authors: Lewei Yao, Hang Xu, Wei Zhang, Xiaodan Liang, Zhenguo Li12661-12668
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The resulting architectures dominate state-of-the-art object detection systems in both inference time and accuracy and demonstrate the effectiveness on multiple detection datasets, e.g. halving the inference time with additional 1% m AP improvement compared to FPN and reaching 46% m AP with the similar inference time of Mask RCNN. |
| Researcher Affiliation | Collaboration | Lewei Yao,1 Hang Xu,1 Wei Zhang,1 Xiaodan Liang,2 Zhenguo Li1 1Huawei Noah s Ark Lab 2Sun Yat-Sen University |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We conduct architecture search on the well-known COCO (Lin et al. 2014) dataset, which contains 80 object classes with 118K images for training, 5K for evaluation. Pre-trained models on Image Net (Russakovsky et al. 2015) are used as our backbone for fast convergence. Extensive experiments are conducted on the widely used detection benchmarks, including Pascal VOC (Everingham et al. 2010), COCO (Lin et al. 2014), and BDD (Yu et al. 2018). |
| Dataset Splits | Yes | Together with the accuracy on validation dataset, a Pareto front is then generated showing the optimal structures of the detector under different resource constraints. Table 3 shows an ablative study of FPN with Res Net-50 trained with different strategies, evaluated on COCO val. For PASCAL VOC dataset (Everingham et al. 2010) with 20 object classes, training is performed on the union of VOC 2007 trainval and VOC 2012 trainval (10K images) and evaluation is on VOC 2007 test (4.9K images). |
| Hardware Specification | Yes | Inference time is tested on one V100 GPU. All experiments using Pytorch (Paszke et al.; Chen et al. 2017; 2018a), multiple computational nodes with 8 V100 cards on each server. |
| Software Dependencies | Yes | All experiments using Pytorch (Paszke et al.; Chen et al. 2017; 2018a), multiple computational nodes with 8 V100 cards on each server. All experiments are performed under CUDA 9.0 and CUDNN 7.0. |
| Experiment Setup | Yes | During architectures evaluation, we use SGD optimizer with cosine decay learning rate from 0.04 to 0.0001, momentum 0.9 and 10 4 as weight decay. We use cosine decay learning rate ranging from 0.24 to 0.0001 with batch size 8 on each GPU. Then stochastic gradient descent (SGD) is performed to train the full model on 8 GPUs with 4 images on each GPU. Following the setting of 2x schedule (He, Girshick, and Doll ar 2018), the initial learning rate is 0.04 (with a linear warm-up), and reduces two times ( 0.1) during finetuning; 10 4 as weight decay; 0.9 as momentum. |