Unbiased IoU for Spherical Image Object Detection
Authors: Feng Dai, Bin Chen, Hang Xu, Yike Ma, Xiaodong Li, Bailan Feng, Peng Yuan, Chenggang Yan, Qiang Zhao508-515
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments show that our unbiased Io U gives accurate results and the proposed Spherical Center Net achieves better performance on one real-world and two synthetic spherical object detection datasets than existing methods. |
| Researcher Affiliation | Collaboration | 1Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3Hangzhou Dianzi University, Hangzhou, China, 4Huawei Noah s Ark Lab |
| Pseudocode | Yes | Algorithm 1: Intersection Area Computation |
| Open Source Code | No | The paper does not include any explicit statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We conduct the experiments on three datasets, including one real-world dataset 360-Indoor (Chou et al. 2020) composed of indoor 360 spherical images for object detection, and another two synthetic spherical datasets 360-VOC-Uniform and 360-VOC-Gaussian. 360-VOC-Gaussian is a synthetic 360 dataset generated from PASCAL VOC 2012 (Everingham et al. 2015). |
| Dataset Splits | Yes | This dataset has 18.6k training images, 6.3k validating images, and 3.1k testing images. |
| Hardware Specification | Yes | 8 Ge Force RTX 2080Ti GPUs are used for training with a batch size of 32 (4 images per GPU). |
| Software Dependencies | No | Our method is implemented in Py Torch (Paszke et al. 2017). While PyTorch is mentioned, its specific version number is not provided, nor are any version numbers for other libraries or software components. |
| Experiment Setup | Yes | We use Adam (Kingma and Ba 2014) to optimize the overall parameters objective for 160 epochs with the initial learning rate 1.25 10 4, and the learning rate is divided by 10 at 90 and 120 epochs. The input resolution of the whole network is 1024 512, which is downsampled 4 through the model. During training, we only use random flip as data augmentation because of the particularity of Equirectangular projection. For the training loss of 360-Indoor dataset, we set λoff = 60 and λfov = 10 to balance the orders of magnitude for each loss term. For the other two 360-VOC-Uniform and 360-VOC-Gaussian datasets, we keep λoff = 1 and λfov = 0.1 in line with the original loss weights because each image only contains one object in these two datasets. |