Zero-Shot Aerial Object Detection with Visual Description Regularization
Authors: Zhengqing Zang, Chenyu Lin, Chenwei Tang, Tao Wang, Jiancheng Lv
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments with three challenging aerial object detection datasets, including DIOR, x View, and DOTA. The results demonstrate that Desc Reg significantly outperforms the state-of-the-art ZSD methods with complex projection designs and generative frameworks, e.g., Desc Reg outperforms best reported ZSD method on DIOR by 4.5 m AP on unseen classes and 8.1 in HM. We further show the generalizability of Desc Reg by integrating it into generative ZSD methods as well as varying the detection architecture. |
| Researcher Affiliation | Academia | 1College of Computer Science, Sichuan University, Chengdu, 610065, P. R. China 2Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, Chengdu, 610065, P. R. China {2022223045158, 2022223040017}@stu.scu.edu.cn, tangchenwei@scu.edu.cn, twangnh@gmail.com lvjiancheng@scu.edu.cn |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. It provides mathematical formulations and a system diagram, but no step-by-step code-like description of the algorithms. |
| Open Source Code | Yes | Codes will be released at https://github.com/zq-zang/Desc Reg. |
| Open Datasets | Yes | We evaluate the proposed method on three challenging remote sensing image object detection datasets: DIOR (Li et al. 2019a), x View (Lam et al. 2018), and DOTA (Xia et al. 2017). |
| Dataset Splits | Yes | For DIOR, we follow the setting in prior work (Huang et al. 2022). For x View and DOTA, we conduct semantic clustering and sample classes within clusters to ensure unseen class diversity and semantic relatness(Rahman, Khan, and Porikli 2018; Huang et al. 2022). The resulting x View contains 48 seen classes and 12 unseen classes, and the resulting DOTA contains 11 seen classes and 4 unseen classes. |
| Hardware Specification | No | The paper does not explicitly state the specific hardware used for running experiments, such as GPU models, CPU types, or memory specifications. It only mentions general detection architectures like Faster R-CNN and YOLOv8. |
| Software Dependencies | No | The paper mentions using Faster R-CNN, ResNet101, and YOLOv8 models but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries used in the implementation. |
| Experiment Setup | Yes | We also observe the temperature value of 0.03 achieves the best performance, which is slightly higher than 0.01 and 0.05. |