Unbiased IoU for Spherical Image Object Detection

Authors: Feng Dai, Bin Chen, Hang Xu, Yike Ma, Xiaodong Li, Bailan Feng, Peng Yuan, Chenggang Yan, Qiang Zhao508-515

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments show that our unbiased Io U gives accurate results and the proposed Spherical Center Net achieves better performance on one real-world and two synthetic spherical object detection datasets than existing methods.
Researcher Affiliation Collaboration 1Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3Hangzhou Dianzi University, Hangzhou, China, 4Huawei Noah s Ark Lab
Pseudocode Yes Algorithm 1: Intersection Area Computation
Open Source Code No The paper does not include any explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets Yes We conduct the experiments on three datasets, including one real-world dataset 360-Indoor (Chou et al. 2020) composed of indoor 360 spherical images for object detection, and another two synthetic spherical datasets 360-VOC-Uniform and 360-VOC-Gaussian. 360-VOC-Gaussian is a synthetic 360 dataset generated from PASCAL VOC 2012 (Everingham et al. 2015).
Dataset Splits Yes This dataset has 18.6k training images, 6.3k validating images, and 3.1k testing images.
Hardware Specification Yes 8 Ge Force RTX 2080Ti GPUs are used for training with a batch size of 32 (4 images per GPU).
Software Dependencies No Our method is implemented in Py Torch (Paszke et al. 2017). While PyTorch is mentioned, its specific version number is not provided, nor are any version numbers for other libraries or software components.
Experiment Setup Yes We use Adam (Kingma and Ba 2014) to optimize the overall parameters objective for 160 epochs with the initial learning rate 1.25 10 4, and the learning rate is divided by 10 at 90 and 120 epochs. The input resolution of the whole network is 1024 512, which is downsampled 4 through the model. During training, we only use random flip as data augmentation because of the particularity of Equirectangular projection. For the training loss of 360-Indoor dataset, we set λoff = 60 and λfov = 10 to balance the orders of magnitude for each loss term. For the other two 360-VOC-Uniform and 360-VOC-Gaussian datasets, we keep λoff = 1 and λfov = 0.1 in line with the original loss weights because each image only contains one object in these two datasets.