Computation Reallocation for Object Detection

Authors: Feng Liang, Chen Lin, Ronghao Guo, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show the effectiveness of our approach. Our CR-Res Net50 and CR-Mobile Net V2 outperforms the baseline by 1.9% and 1.7% COCO AP respectively without any additional computation budget. The models discovered by CR-NAS can be equiped to other powerful detection neck/head and be easily transferred to other dataset, e.g. PASCAL VOC, and other vision tasks, e.g. instance segmentation. Our CR-NAS can be used as a plugin to improve the performance of various networks, which is demanding.
Researcher Affiliation Collaboration 1Sensetime Research Group {liangfeng,linchen,guoronghao,sunming1,wuwei,yanjunjie}@sensetime.com 2The University of Sydney wanli.ouyang@sensetime.edu.au
Pseudocode Yes Algorithm 1: Greedy operation search algorithm Input: Number of blocks B; Possible operations set of each blocks O = {Oi | i = 1, 2, ..., B}; Supernet with trained weights N(O, W ); Dataset for validation Dval; Evaluation metric APval;. Output: Best architecture o Initialize top K partial architecture p = Ø for i = 1, 2, ..., B do pextend = p Oi denotes Cartesian product result = {(arch, AP) | arch pextend, AP = evaluate(arch)} p = choose top K(result) end Output: Best architecture o = choose top1(p).
Open Source Code No The paper does not provide any statement or link indicating that the source code for their methodology is open-source or publicly available.
Open Datasets Yes We evaluate our method on the challenging MS COCO benchmark (Lin et al., 2014). We split the 135K training images trainval135 into 130K images archtrain and 5K images archval. First, we train the supernet using archtrain and evaluate the architecture using archval. After the architecture is obtained, we follow other standard detectors (Ren et al., 2015; Lin et al., 2017a) on using Image Net (Russakovsky et al., 2015) for pre-training the weights of this architecture. The final model is fine-tuned on the whole COCO trainval135 and validated on COCO minival. Another detection dataset VOC (Everingham et al., 2015) is also used.
Dataset Splits Yes We split the 135K training images trainval135 into 130K images archtrain and 5K images archval. First, we train the supernet using archtrain and evaluate the architecture using archval. After the architecture is obtained, we follow other standard detectors (Ren et al., 2015; Lin et al., 2017a) on using Image Net (Russakovsky et al., 2015) for pre-training the weights of this architecture. The final model is fine-tuned on the whole COCO trainval135 and validated on COCO minival.
Hardware Specification Yes We use multi-GPU training over 8 1080TI GPUs with total batch size 16.
Software Dependencies No The paper mentions using stochastic gradient descent (SGD) and synchronized Batch Norm (Sync BN), but it does not specify version numbers for any software dependencies like PyTorch, TensorFlow, or Python.
Experiment Setup Yes For the training of our searched models, the input images are resized to have a short side of 800 pixels or a long side of 1333 pixels. We use stochastic gradient descent (SGD) as optimizer with 0.9 momentum and 0.0001 weight decay. For fair comparison, all our models are trained for 13 epochs, known as 1 schedule (Girshick et al., 2018). We use multi-GPU training over 8 1080TI GPUs with total batch size 16. The initial learning rate is 0.00125 per image and is divided by 10 at 8 and 11 epochs. Warm-up and synchronized Batch Norm (Sync BN) (Peng et al., 2018) are adopted for both baselines and our searched models.