reproducibilityindex.ai

EHSOD: CAM-Guided End-to-End Hybrid-Supervised Object Detection with Cascade Refinement

Authors: Linpu Fang, Hang Xu, Zhili Liu, Sarah Parisot, Zhenguo Li10778-10785

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of the proposed method and it achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data, e.g. 37.5% m AP on COCO. We evaluate the performance of our proposed EHSOD method on two common detection benchmarks: the PASCAL VOC 2007 ( Everingham et al. 2015), and the MS-COCO 2017 dataset ( Lin et al. 2014).
Researcher Affiliation	Collaboration	Linpu Fang,1 Hang Xu,2 Zhili Liu,2 Sarah Parisot,2 Zhenguo Li2 1South China University of Technology 2Huawei Noah s Ark Lab
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	We will release the code and the trained models.
Open Datasets	Yes	We evaluate the performance of our proposed EHSOD method on two common detection benchmarks: the PASCAL VOC 2007 ( Everingham et al. 2015), and the MS-COCO 2017 dataset ( Lin et al. 2014).
Dataset Splits	Yes	The MS-COCO dataset has 80 object classes, which is divided into train set (118K images), val set (5K images) and test set (20K unannotated images). For PASCAL VOC 2007 , we choose trainval set (5,011 images) for training and choose the test set (4,952 images) for testing.
Hardware Specification	Yes	All experiments are conducted on a single server with 8 Tesla V100 GPUs by using the Pytorch framework.
Software Dependencies	No	The paper mentions 'Pytorch framework' but does not specify a version number.
Experiment Setup	Yes	We set the loss weights α1and α2 in LCAM RP N to 0.1 and 0.2 respectively, set the loss weights λ1, λ2 and λ3 for three hybrid-supervised heads to 1, 0.5. 0.25 respectively, and set all the other loss weights to 1. The scale factor σ for generating the positive region of the ground truth CAM is set to 0.8. The hyper-parameters α and γ for focal loss in the LCAM seg are set to 0.25 and 2 respectively. For training, SGD with weight decay of 0.0001 and momentum of 0.9 is adopted to optimize all models. For the PASCAL VOC dataset, the batch size is set to be 8 with 4 images on each GPU, the initial learning rate is 0.005, reduce by 0.1 at epoch 9 during the training process. For the MSCOCO dataset, the batch size is set to be 16 with 2 images on each GPU, the initial learning rate is 0.01, reduce by 0.1 at epoch 8 and 11 during the training process. We only train 12 epochs for all models in an end-to-end manner.