Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Zero-Shot Aerial Object Detection with Visual Description Regularization
Authors: Zhengqing Zang, Chenyu Lin, Chenwei Tang, Tao Wang, Jiancheng Lv
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments with three challenging aerial object detection datasets, including DIOR, x View, and DOTA. The results demonstrate that Desc Reg significantly outperforms the state-of-the-art ZSD methods with complex projection designs and generative frameworks, e.g., Desc Reg outperforms best reported ZSD method on DIOR by 4.5 m AP on unseen classes and 8.1 in HM. We further show the generalizability of Desc Reg by integrating it into generative ZSD methods as well as varying the detection architecture. |
| Researcher Affiliation | Academia | 1College of Computer Science, Sichuan University, Chengdu, 610065, P. R. China 2Engineering Research Center of Machine Learning and Industry Intelligence, Ministry of Education, Chengdu, 610065, P. R. China EMAIL, EMAIL, EMAIL EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. It provides mathematical formulations and a system diagram, but no step-by-step code-like description of the algorithms. |
| Open Source Code | Yes | Codes will be released at https://github.com/zq-zang/Desc Reg. |
| Open Datasets | Yes | We evaluate the proposed method on three challenging remote sensing image object detection datasets: DIOR (Li et al. 2019a), x View (Lam et al. 2018), and DOTA (Xia et al. 2017). |
| Dataset Splits | Yes | For DIOR, we follow the setting in prior work (Huang et al. 2022). For x View and DOTA, we conduct semantic clustering and sample classes within clusters to ensure unseen class diversity and semantic relatness(Rahman, Khan, and Porikli 2018; Huang et al. 2022). The resulting x View contains 48 seen classes and 12 unseen classes, and the resulting DOTA contains 11 seen classes and 4 unseen classes. |
| Hardware Specification | No | The paper does not explicitly state the specific hardware used for running experiments, such as GPU models, CPU types, or memory specifications. It only mentions general detection architectures like Faster R-CNN and YOLOv8. |
| Software Dependencies | No | The paper mentions using Faster R-CNN, ResNet101, and YOLOv8 models but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries used in the implementation. |
| Experiment Setup | Yes | We also observe the temperature value of 0.03 achieves the best performance, which is slightly higher than 0.01 and 0.05. |