Polygon-to-Polygon Distance Loss for Rotated Object Detection

Authors: Yang Yang, Jifeng Chen, Xiaopin Zhong, Yuanlong Deng3072-3080

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We executed extensive experiments on the commonly-used DOTA and HRSC2016 dataset to demonstrate the performance of P2P Loss and compared it with other key detectors. Retina Net with P2P Loss can achieve 79.155% m AP on the DOTA dataset, which is preferable over the state-of-the-art rotation detectors.
Researcher Affiliation Academia Lab. of Machine Vision and Inspection, College of Mechatronics and Control Engineering, Shenzhen University, China 1910294008@email.szu.edu.cn, chenjifeng2020@email.szu.edu.cn xzhong@szu.edu.cn, dengyl@szu.edu.cn
Pseudocode No The paper describes its method mathematically but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or a link to open-source code for the described methodology.
Open Datasets Yes Dataset Aerial image dataset DOTA (Xia et al. 2018) consists of 2806 images ranging in size from 800 800 to 4000 4000 pixels and contains 188282 objects. HRSC2016 (Liu et al. 2017) is a high-resolution ship detection data set with 436, 181 and 444 images for training, verification and testing, respectively.
Dataset Splits Yes The ratios of training set, validation set and test set are 1/2, 1/6, and 1/3 respectively. HRSC2016 (Liu et al. 2017) is a high-resolution ship detection data set with 436, 181 and 444 images for training, verification and testing, respectively.
Hardware Specification Yes We use 4 Ge Force RTX 3090 GPUs with a total of 8 images per minibatch (2 images per GPU) for training and a single Ge Force RTX 3090 GPU for inference.
Software Dependencies No The paper does not provide specific version numbers for ancillary software dependencies such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We preset 3 anchors with aspect ratios of 0.5,1.0,2.0 and angle of 0 at each position of the pyramidal features at each level by default, unless otherwise specified. We use 4 Ge Force RTX 3090 GPUs with a total of 8 images per minibatch (2 images per GPU) for training and a single Ge Force RTX 3090 GPU for inference. All experiments are trained using the Adam (Kingma and Ba 2014) optimizer with learning rate 0.0001. We use random flipping to avoid overfitting during training, and no other tricks if not specified.