CBNet: A Novel Composite Backbone Network Architecture for Object Detection

Authors: Yudong Liu, Yongtao Wang, Siwei Wang, Tingting Liang, Qijie Zhao, Zhi Tang, Haibin Ling11653-11660

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On the widely tested MS-COCO benchmark (Lin et al. 2014), we conduct experiments by applying the proposed Composite Backbone Network to several state-of-the-art object detectors, such as FPN (Lin et al. 2017a), Mask RCNN (He et al. 2017) and Cascade R-CNN (Cai and Vasconcelos 2018). Experimental results show that the m APs of all the detectors consistently increase by 1.5 to 3.0 points, which demonstrates the effectiveness of our Composite Backbone Network.
Researcher Affiliation Academia Yudong Liu,1 Yongtao Wang,1 Siwei Wang,1 Tingting Liang,1 Qijie Zhao,1 Zhi Tang,1 Haibin Ling2 1Wangxuan Institute of Computer Technology, Peking University 2Department of Computer Science, Stony Brook University
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code will be made available at https://github.com/PKUbahuangliuhe/CBNet.
Open Datasets Yes On the widely tested MS-COCO benchmark (Lin et al. 2014), we conduct experiments by applying the proposed Composite Backbone Network to several state-of-the-art object detectors
Dataset Splits Yes Following the protocol in MS-COCO, we use the trainval35k set for training, which is a union of 80k images from the train split and a random 35k subset of images from the 40k image validation split.
Hardware Specification Yes We conduct experiments on a machine with 4 NVIDIA Titan X GPUs, CUDA 9.2 and cu DNN 7.1.4 for most experiments. In addition, we train Cascade Mask R-CNN with Dual-Res Ne Xt152 on a machine with 4 NVIDIA P40 GPUs and Cascade Mask R-CNN with Triple Res Ne Xt152 on a machine with 4 NVIDIA V100 GPUs.
Software Dependencies Yes We conduct experiments on a machine with 4 NVIDIA Titan X GPUs, CUDA 9.2 and cu DNN 7.1.4 for most experiments. Baselines methods in this paper are reproduced by ourselves based on the Detectron framework (Girshick et al. 2018).
Experiment Setup Yes Specifically, the short side of input image is resized to 800, and the longer side is limited to 1,333. The data augmentation is simply flipping the images. For most of the original baselines, batch size on a single GPU is two images. Due to the limitation of GPU memory for CBNet, we put one image on each GPU for training the detectors using CBNet. Meanwhile, we set the initial learning rate as half of the default value and train for the same epochs as the original baselines.