SAUI: Scale-Aware Unseen Imagineer for Zero-Shot Object Detection

Authors: Jiahao Wang, Caixia Yan, Weizhan Zhang, Huan Liu, Hao Sun, Qinghua Zheng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on PASCAL VOC, COCO and DIOR datasets demonstrate SAUI s better performance in different scenarios, especially for scale-varying and small objects. Notably, SAUI achieves the new state-of-the-art performance on COCO and DIOR.
Researcher Affiliation Collaboration 1School of Computer Science and Technology, MOEKLINNS Laboratory, Xi an Jiaotong University 2China Telecom Artificial Intelligence Technology Co.Ltd
Pseudocode No The paper describes the training procedure and model components but does not provide any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets Yes We evaluate SAUI on two typical ZSD datasets, i.e. PASCAL VOC 2007+2012 (Everingham et al. 2010), MS COCO 2014 (Lin et al. 2014) and one remote sensing detection dataset, i.e. DIOR (Li et al. 2020).
Dataset Splits No The paper describes seen/unseen splits for class allocation in ZSD, e.g., '16/4 split', '48/17&65/15 split', which define the training and test sets based on class type. However, it does not explicitly describe a distinct validation dataset split (e.g., percentage or count for hyperparameter tuning) for reproducibility within the training data.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper mentions using 'CLIP text encoder (Radford et al. 2021)' but does not specify version numbers for CLIP or other software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes We set Nv to 4 so that features from 4 scale-view channels are considered. Both the generator Gj and discriminator Dj are implemented as two-layer fully-connected networks with 4096 hidden units per layer. For MS COCO/DIOR/PASCAL VOC, we train Faster RCNN for 14/14/18 epochs respectively. We set α1 in Eq. (1) to 10 1/10 1/10 2 and sampling radius r to 10 4/10 4/10 6, respectively. Besides, α2, α3, α4 in Eq. (1) are set to 10 3, 10 3, 10 4, while the temperature coefficient τ is set to 10 1. The number of negative samples N is 10.