EHSOD: CAM-Guided End-to-End Hybrid-Supervised Object Detection with Cascade Refinement
Authors: Linpu Fang, Hang Xu, Zhili Liu, Sarah Parisot, Zhenguo Li10778-10785
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of the proposed method and it achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data, e.g. 37.5% m AP on COCO. We evaluate the performance of our proposed EHSOD method on two common detection benchmarks: the PASCAL VOC 2007 ( Everingham et al. 2015), and the MS-COCO 2017 dataset ( Lin et al. 2014). |
| Researcher Affiliation | Collaboration | Linpu Fang,1 Hang Xu,2 Zhili Liu,2 Sarah Parisot,2 Zhenguo Li2 1South China University of Technology 2Huawei Noah s Ark Lab |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | We will release the code and the trained models. |
| Open Datasets | Yes | We evaluate the performance of our proposed EHSOD method on two common detection benchmarks: the PASCAL VOC 2007 ( Everingham et al. 2015), and the MS-COCO 2017 dataset ( Lin et al. 2014). |
| Dataset Splits | Yes | The MS-COCO dataset has 80 object classes, which is divided into train set (118K images), val set (5K images) and test set (20K unannotated images). For PASCAL VOC 2007 , we choose trainval set (5,011 images) for training and choose the test set (4,952 images) for testing. |
| Hardware Specification | Yes | All experiments are conducted on a single server with 8 Tesla V100 GPUs by using the Pytorch framework. |
| Software Dependencies | No | The paper mentions 'Pytorch framework' but does not specify a version number. |
| Experiment Setup | Yes | We set the loss weights α1and α2 in LCAM RP N to 0.1 and 0.2 respectively, set the loss weights λ1, λ2 and λ3 for three hybrid-supervised heads to 1, 0.5. 0.25 respectively, and set all the other loss weights to 1. The scale factor σ for generating the positive region of the ground truth CAM is set to 0.8. The hyper-parameters α and γ for focal loss in the LCAM seg are set to 0.25 and 2 respectively. For training, SGD with weight decay of 0.0001 and momentum of 0.9 is adopted to optimize all models. For the PASCAL VOC dataset, the batch size is set to be 8 with 4 images on each GPU, the initial learning rate is 0.005, reduce by 0.1 at epoch 9 during the training process. For the MSCOCO dataset, the batch size is set to be 16 with 2 images on each GPU, the initial learning rate is 0.01, reduce by 0.1 at epoch 8 and 11 during the training process. We only train 12 epochs for all models in an end-to-end manner. |