Relational Learning for Joint Head and Human Detection

Authors: Cheng Chi, Shifeng Zhang, Junliang Xing, Zhen Lei, Stan Z. Li, Xudong Zou10647-10654

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the effectiveness of the proposed method, we annotate head bounding boxes of the City Persons and Caltech-USA datasets, and conduct extensive experiments on the Crowd Human, City Persons and Caltech-USA datasets. As a consequence, the proposed Joint Det detector achieves state-of-the-art performance on these three benchmarks.
Researcher Affiliation Academia 1Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China 2CBSR & NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing, China 3University of Chinese Academy of Sciences, Beijing, China 4Macau University of Science and Technology, Macao, China
Pseudocode Yes Algorithm 1 Relationship Discriminating Module
Open Source Code Yes To facilitate further studies on the head and human detection problem, all new annotations, source codes and trained models are available at https://github.com/ Chi Cheng123/Joint Det.
Open Datasets Yes To facilitate further studies on the head and human detection problem, all new annotations, source codes and trained models are available at https://github.com/ Chi Cheng123/Joint Det.
Dataset Splits Yes Crowd Human is a benchmark dataset to better evaluate detectors in crowd scenarios. It is large, rich-annotated, high-diversity and contains 15, 000, 4, 370 and 5, 000 images for training, validation and testing subsets, respectively.
Hardware Specification Yes The proposed Joint Det is trained on 16 GTX 1080Ti GPUs with a mini-batch 2 per GPU for Crowd Human and Caltech-USA, and the mini-batch size for Citypersons is 1 per GPU.
Software Dependencies No The paper mentions implementing Joint Det using the PyTorch library, but does not specify any version numbers for PyTorch or other software dependencies.
Experiment Setup Yes During the training phase, the input images are resized so that their short edges are at 800 pixels while the long edges should be no more than 1333 pixels at the same time. We train Joint Det with the initial learning rate 0.04 for the first 16 epochs, and decay it by 10 and 100 times for another 6 and 3 epochs. ... We fine-tune the model using SGD with 0.9 momentum, 0.0001 weight decay. The proposed Joint Det is trained on 16 GTX 1080Ti GPUs with a mini-batch 2 per GPU for Crowd Human and Caltech-USA, and the mini-batch size for Citypersons is 1 per GPU. Each mini-batch involves 512 Ro Is per image. Multi-scale training and testing are not applied to ensure fair comparisons with previous methods.