SAUI: Scale-Aware Unseen Imagineer for Zero-Shot Object Detection
Authors: Jiahao Wang, Caixia Yan, Weizhan Zhang, Huan Liu, Hao Sun, Qinghua Zheng
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on PASCAL VOC, COCO and DIOR datasets demonstrate SAUI s better performance in different scenarios, especially for scale-varying and small objects. Notably, SAUI achieves the new state-of-the-art performance on COCO and DIOR. |
| Researcher Affiliation | Collaboration | 1School of Computer Science and Technology, MOEKLINNS Laboratory, Xi an Jiaotong University 2China Telecom Artificial Intelligence Technology Co.Ltd |
| Pseudocode | No | The paper describes the training procedure and model components but does not provide any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the described methodology. |
| Open Datasets | Yes | We evaluate SAUI on two typical ZSD datasets, i.e. PASCAL VOC 2007+2012 (Everingham et al. 2010), MS COCO 2014 (Lin et al. 2014) and one remote sensing detection dataset, i.e. DIOR (Li et al. 2020). |
| Dataset Splits | No | The paper describes seen/unseen splits for class allocation in ZSD, e.g., '16/4 split', '48/17&65/15 split', which define the training and test sets based on class type. However, it does not explicitly describe a distinct validation dataset split (e.g., percentage or count for hyperparameter tuning) for reproducibility within the training data. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper mentions using 'CLIP text encoder (Radford et al. 2021)' but does not specify version numbers for CLIP or other software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We set Nv to 4 so that features from 4 scale-view channels are considered. Both the generator Gj and discriminator Dj are implemented as two-layer fully-connected networks with 4096 hidden units per layer. For MS COCO/DIOR/PASCAL VOC, we train Faster RCNN for 14/14/18 epochs respectively. We set α1 in Eq. (1) to 10 1/10 1/10 2 and sampling radius r to 10 4/10 4/10 6, respectively. Besides, α2, α3, α4 in Eq. (1) are set to 10 3, 10 3, 10 4, while the temperature coefficient τ is set to 10 1. The number of negative samples N is 10. |