Improving Crowded Object Detection via Copy-Paste

Authors: Jiangfan Deng, Dewen Fan, Xiaosong Qiu, Feng Zhou

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our approach can easily improve the state-of-the-art detector in typical crowded detection task by more than 2% without any bells and whistles.
Researcher Affiliation Industry Algorithm Research, Aibee Inc. jfdeng100@foxmail.com, {dwfan,xsqiu,fzhou}@aibee.com
Pseudocode Yes Algorithm 1: Overlay Depth-aware NMS
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes Pedestrian detection is the most typical task burdened by the crowdedness problem, so our experiments are conducted mainly on two datasets: Crowd Human (Shao et al. 2018) and City Persons (Zhang, Benenson, and Schiele 2017). [...] we prepare another sparse training set by re-labeling full body box of persons in COCO (Lin et al. 2014) to further evaluate the potential of our method. We name this train set as COCO-fullperson (we will release this dataset). Moreover, we use the category of car in KITTI (Geiger, Lenz, and Urtasun 2012) to further estimate the generality.
Dataset Splits Yes Since both the training and validation data hold the same level of crowdedness, we prepare another sparse training set by re-labeling full body box of persons in COCO (Lin et al. 2014) to further evaluate the potential of our method. [...] Table 1: Results on Crowd Human val set.
Hardware Specification Yes We train the networks on 8 Nvidia V100 GPUs with 2 images on each GPU.
Software Dependencies No The paper mentions using "Mask R-CNN (He et al. 2017) model adopting Res Net-50 (He et al. 2016) as backbone" but does not specify version numbers for any software dependencies.
Experiment Setup Yes During training, the short side of each image is resized to 800 and the long side is limited within 1400. Models are trained for 60k iterations starting from an initial learning rate of 0.02 (Faster R-CNN) or 0.01 (Retina Net) and is reduced by 0.1 on 30k and 40k iters respectively.