EHSOD: CAM-Guided End-to-End Hybrid-Supervised Object Detection with Cascade Refinement

Authors: Linpu Fang, Hang Xu, Zhili Liu, Sarah Parisot, Zhenguo Li10778-10785

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness of the proposed method and it achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data, e.g. 37.5% m AP on COCO. We evaluate the performance of our proposed EHSOD method on two common detection benchmarks: the PASCAL VOC 2007 ( Everingham et al. 2015), and the MS-COCO 2017 dataset ( Lin et al. 2014).
Researcher Affiliation Collaboration Linpu Fang,1 Hang Xu,2 Zhili Liu,2 Sarah Parisot,2 Zhenguo Li2 1South China University of Technology 2Huawei Noah s Ark Lab
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No We will release the code and the trained models.
Open Datasets Yes We evaluate the performance of our proposed EHSOD method on two common detection benchmarks: the PASCAL VOC 2007 ( Everingham et al. 2015), and the MS-COCO 2017 dataset ( Lin et al. 2014).
Dataset Splits Yes The MS-COCO dataset has 80 object classes, which is divided into train set (118K images), val set (5K images) and test set (20K unannotated images). For PASCAL VOC 2007 , we choose trainval set (5,011 images) for training and choose the test set (4,952 images) for testing.
Hardware Specification Yes All experiments are conducted on a single server with 8 Tesla V100 GPUs by using the Pytorch framework.
Software Dependencies No The paper mentions 'Pytorch framework' but does not specify a version number.
Experiment Setup Yes We set the loss weights α1and α2 in LCAM RP N to 0.1 and 0.2 respectively, set the loss weights λ1, λ2 and λ3 for three hybrid-supervised heads to 1, 0.5. 0.25 respectively, and set all the other loss weights to 1. The scale factor σ for generating the positive region of the ground truth CAM is set to 0.8. The hyper-parameters α and γ for focal loss in the LCAM seg are set to 0.25 and 2 respectively. For training, SGD with weight decay of 0.0001 and momentum of 0.9 is adopted to optimize all models. For the PASCAL VOC dataset, the batch size is set to be 8 with 4 images on each GPU, the initial learning rate is 0.005, reduce by 0.1 at epoch 9 during the training process. For the MSCOCO dataset, the batch size is set to be 16 with 2 images on each GPU, the initial learning rate is 0.01, reduce by 0.1 at epoch 8 and 11 during the training process. We only train 12 epochs for all models in an end-to-end manner.