Beta R-CNN: Looking into Pedestrian Detection from Another Perspective

Authors: Zixuan Xu, Banghuai Li, Ye Yuan, Anhong Dang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the extremely crowded benchmark Crowd Human [1] and City Persons [2] show that our proposed approach can outperform the state-of-the-art results, which strongly validate the superiority of our method.
Researcher Affiliation Collaboration Zixuan Xu Peking University zixuanxu@pku.edu.cn; Banghuai Li Megvii Research libanghuai@megvii.com; Ye Yuan Megvii Research yuanye@megvii.com; Anhong Dang Peking University ahdang@pku.edu.cn
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code No Code will be released at github.com/Guardian44x/Beta-R-CNN.
Open Datasets Yes City Persons Dataset. The City Persons dataset [2] is a subset of Cityscapes which only consists of person annotations. There are 2975 images for training, 500 and 1575 images for validation and testing. The average number of pedestrians in an image is 7. We evaluate our proposed method under the full-body setting, following the evaluation protocol in [2], and the partition of validation set follows the standard setting in [19] on account of visibility: Heavy [0, 0.65], Partial [0.65, 0.9], Bare [0.9, 1], Reasonable [0.65, 1]. Crowd Human Dataset. The Crowd Human dataset [1], has been recently released to specifically target the crowd issue in the human detection task. There are 15000, 4370, and 5000 images in the training, validation, and testing set respectively.
Dataset Splits Yes City Persons Dataset. There are 2975 images for training, 500 and 1575 images for validation and testing...Crowd Human Dataset. There are 15000, 4370, and 5000 images in the training, validation, and testing set respectively.
Hardware Specification Yes We take Crowd Human validation set with 800x1400 input size to conduct speed experiments on NVIDIA 2080Ti GPU with 8 GPUs, and the average speeds are 0.483s/image ( Cascade R-CNN baseline) and 0.487s/image (Beta R-CNN) respectively.
Software Dependencies No No specific software dependencies with version numbers were mentioned.
Experiment Setup Yes As for anchor settings, we follow the same anchor scales in [30], while the aspect ratios are set to H : W = {1 : 1, 2 : 1, 3 : 1}. For training, the batch size is 16, split to 8 GPUs. Each training round includes 16000 iterations on City Persons and 40000 iterations on Crowd Human. The learning rate is initialized to 0.02 and divided by 10 at half and three-quarter of total iterations respectively. During training, the sampling ratio of positive to negative proposals for Ro I branch is 1 : 1 for Crowd Human and 1 : 4 for City Persons.