Beta R-CNN: Looking into Pedestrian Detection from Another Perspective
Authors: Zixuan Xu, Banghuai Li, Ye Yuan, Anhong Dang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the extremely crowded benchmark Crowd Human [1] and City Persons [2] show that our proposed approach can outperform the state-of-the-art results, which strongly validate the superiority of our method. |
| Researcher Affiliation | Collaboration | Zixuan Xu Peking University zixuanxu@pku.edu.cn; Banghuai Li Megvii Research libanghuai@megvii.com; Ye Yuan Megvii Research yuanye@megvii.com; Anhong Dang Peking University ahdang@pku.edu.cn |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | No | Code will be released at github.com/Guardian44x/Beta-R-CNN. |
| Open Datasets | Yes | City Persons Dataset. The City Persons dataset [2] is a subset of Cityscapes which only consists of person annotations. There are 2975 images for training, 500 and 1575 images for validation and testing. The average number of pedestrians in an image is 7. We evaluate our proposed method under the full-body setting, following the evaluation protocol in [2], and the partition of validation set follows the standard setting in [19] on account of visibility: Heavy [0, 0.65], Partial [0.65, 0.9], Bare [0.9, 1], Reasonable [0.65, 1]. Crowd Human Dataset. The Crowd Human dataset [1], has been recently released to specifically target the crowd issue in the human detection task. There are 15000, 4370, and 5000 images in the training, validation, and testing set respectively. |
| Dataset Splits | Yes | City Persons Dataset. There are 2975 images for training, 500 and 1575 images for validation and testing...Crowd Human Dataset. There are 15000, 4370, and 5000 images in the training, validation, and testing set respectively. |
| Hardware Specification | Yes | We take Crowd Human validation set with 800x1400 input size to conduct speed experiments on NVIDIA 2080Ti GPU with 8 GPUs, and the average speeds are 0.483s/image ( Cascade R-CNN baseline) and 0.487s/image (Beta R-CNN) respectively. |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned. |
| Experiment Setup | Yes | As for anchor settings, we follow the same anchor scales in [30], while the aspect ratios are set to H : W = {1 : 1, 2 : 1, 3 : 1}. For training, the batch size is 16, split to 8 GPUs. Each training round includes 16000 iterations on City Persons and 40000 iterations on Crowd Human. The learning rate is initialized to 0.02 and divided by 10 at half and three-quarter of total iterations respectively. During training, the sampling ratio of positive to negative proposals for Ro I branch is 1 : 1 for Crowd Human and 1 : 4 for City Persons. |