Weakly Supervised Open-Vocabulary Object Detection

Authors: Jianghang Lin, Yunhang Shen, Bingquan Wang, Shaohui Lin, Ke Li, Liujuan Cao

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on Pascal VOC and MS COCO demonstrate that the proposed WSOVOD achieves new state-of-the-art compared with previous WSOD methods in both close-set object localization and detection tasks. Meanwhile, WSOVOD enables cross-dataset and open-vocabulary learning to achieve on-par or even better performance than well-established fully-supervised openvocabulary object detection (FSOVOD).
Researcher Affiliation Collaboration 1Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China. 2Tencent Youtu Lab, China. 3School of Computer Science and Technology, East China Normal University, China.
Pseudocode No No pseudocode or clearly labeled algorithm block is present in the paper.
Open Source Code No No explicit statement or link regarding the release of open-source code for the described methodology is provided in the paper.
Open Datasets Yes We evaluate the proposed WSOVOD framework on Pascal VOC 2007, 2012 (Everingham et al. 2010) and MS COCO (Lin et al. 2014), which are widely used for WSOD. In addition, we also use ILSVRC (Russakovsky et al. 2015) and LVIS (Gupta, Doll ar, and Girshick 2019) for open-vocabulary learning, both of which are widely used for FSOVOD.
Dataset Splits No The paper mentions splitting COCO into novel and base classes for evaluation, and refers to 'common setting' or 'standard Pascal VOC protocol' for metrics, but does not provide specific percentages, sample counts, or explicit details about training, validation, and test dataset splits needed for reproduction.
Hardware Specification Yes We use synchronized SGD training on Nvidia 3090 with a batch size of 4, a mini-batch involving 1 images per GPU.
Software Dependencies No The paper mentions training configurations like 'synchronized SGD training' and model architectures (VGG16, RN18/50-WS-MRRP) but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup Yes We use synchronized SGD training on Nvidia 3090 with a batch size of 4, a mini-batch involving 1 images per GPU. We use learning rates of 1e 3 and 1e 2 for VGG16 and RN18/50-WS-MRRP backbone, respectfully, a momentum of 0.9, a dropout rate of 0.5, a learning rate decay weight of 0.1.