Zero-Shot Aerial Object Detection with Visual Description Regularization

Authors: Zhengqing Zang, Chenyu Lin, Chenwei Tang, Tao Wang, Jiancheng Lv

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments with three challenging aerial object detection datasets, including DIOR, x View, and DOTA. The results demonstrate that Desc Reg significantly outperforms the state-of-the-art ZSD methods with complex projection designs and generative frameworks, e.g., Desc Reg outperforms best reported ZSD method on DIOR by 4.5 m AP on unseen classes and 8.1 in HM. We further show the generalizability of Desc Reg by integrating it into generative ZSD methods as well as varying the detection architecture.
Researcher Affiliation Academia 1College of Computer Science, Sichuan University, Chengdu, 610065, P. R. China 2Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, Chengdu, 610065, P. R. China {2022223045158, 2022223040017}@stu.scu.edu.cn, tangchenwei@scu.edu.cn, twangnh@gmail.com lvjiancheng@scu.edu.cn
Pseudocode No The paper does not contain any pseudocode or algorithm blocks. It provides mathematical formulations and a system diagram, but no step-by-step code-like description of the algorithms.
Open Source Code Yes Codes will be released at https://github.com/zq-zang/Desc Reg.
Open Datasets Yes We evaluate the proposed method on three challenging remote sensing image object detection datasets: DIOR (Li et al. 2019a), x View (Lam et al. 2018), and DOTA (Xia et al. 2017).
Dataset Splits Yes For DIOR, we follow the setting in prior work (Huang et al. 2022). For x View and DOTA, we conduct semantic clustering and sample classes within clusters to ensure unseen class diversity and semantic relatness(Rahman, Khan, and Porikli 2018; Huang et al. 2022). The resulting x View contains 48 seen classes and 12 unseen classes, and the resulting DOTA contains 11 seen classes and 4 unseen classes.
Hardware Specification No The paper does not explicitly state the specific hardware used for running experiments, such as GPU models, CPU types, or memory specifications. It only mentions general detection architectures like Faster R-CNN and YOLOv8.
Software Dependencies No The paper mentions using Faster R-CNN, ResNet101, and YOLOv8 models but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries used in the implementation.
Experiment Setup Yes We also observe the temperature value of 0.03 achieves the best performance, which is slightly higher than 0.01 and 0.05.