Object Detection in Densely Packed Scenes via Semi-Supervised Learning with Dual Consistency

Authors: Chao Ye, Huaidong Zhang, Xuemiao Xu, Weiwei Cai, Jing Qin, Kup-Sze Choi

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted over both the Rebar DSC dataset and the famous large public dataset SKU-110K. Experimental results corroborate that the proposed method is able to improve the object detection performance in densely packed scenes, consistently outperforming state-of-the-art approaches.
Researcher Affiliation Academia 1South China University of Technology 2Centre for Smart Health, The Hong Kong Polytechnic University 3State Key Laboratory of Subtropical Building Science 4Ministry of Education Key Laboratory of Big Data and Intelligent Robot 5Guangdong Provincial Key Lab of Computational Intelligence and Cyberspace Information
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper states, "Dataset is available in https://github.com/Armin1337/Rebar DSC.", which is a link to the dataset. It does not provide a specific repository link or an explicit statement about releasing the source code for the described methodology.
Open Datasets Yes We evaluate our framework on SKU110K [Goldman et al., 2019] and Rebar DSC datasets. Dataset is available in https://github.com/Armin1337/Rebar DSC.
Dataset Splits No For the Rebar DSC dataset, we randomly select 1,000 images as the training set, and consider other 1,125 images as the test set. The paper specifies train and test splits but does not explicitly mention a validation dataset split or its size/percentage for either dataset.
Hardware Specification Yes Our framework was implemented in Py Torch, using one NVIDIA Ge Force 3090.
Software Dependencies No The paper mentions 'Py Torch' but does not specify a version number or list any other software dependencies with their versions.
Experiment Setup Yes We initialize Faster R-CNN with the parameters trained from COCO dataset [Lin et al., 2014], and pre-train Faster R-CNN using all the available labeled samples with 12 epochs following the standard supervised learning. We then initialize the student and teacher networks with the pre-trained weights. With the initialized models, we train the student network on both the labeled and unlabeled data by minimizing the supervised loss and consistency losses with extra 12 epochs. The student network is trained by an SGD optimizer with momentum=0.9, weight decay=0.0001 and learning rate=0.0025. Each training batch contains three samples, consisting of one labeled sample and two unlabeled samples. All the input images are resized with their shorter side as 1200. The weights in the consistency loss functions are set as λiou = 1, λc = 0.5, λp = 1, λd = 2, λr1 = 2, λr2 = 1, which are chosen by cross-validations on the training set. Following the previous works [Laine and Aila, 2016; Tarvainen and Valpola, 2017], we also ramp up the coefficient of consistency loss and gradually increase the EMA decay factor λema from 0 to 0.99 in all the epochs.