reproducibilityindex.ai

Multi-modal Queried Object Detection in the Wild

Authors: Yifan Xu, Mengdan Zhang, Chaoyou Fu, Peixian Chen, Xiaoshan Yang, Ke Li, Changsheng Xu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that multi-modal queries largely boost open-world detection. For instance, MQ-Det significantly improves the stateof-the-art open-set detector GLIP by +7.8% AP on the LVIS benchmark via multimodal queries without any downstream finetuning, and averagely +6.3% AP on 13 few-shot downstream tasks, with merely additional 3% modulating time required by GLIP.
Researcher Affiliation	Collaboration	Yifan Xu1,3 , Mengdan Zhang2 , Chaoyou Fu2, Peixian Chen2, Xiaoshan Yang1,3,4, Ke Li2, Changsheng Xu1,3,4 1MAIS, Institute of Automation, Chinese Academy of Sciences 2Tencent Youtu Lab 3School of Artificial Intelligence, University of the Chinese Academy of Sciences 4Peng Cheng Laboratory
Pseudocode	No	The paper describes its methods through architectural diagrams (Figure 1) and textual explanations, but does not include any formal pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/Yifan Xu74/MQ-Det.
Open Datasets	Yes	Objects365 dataset [36] is a large-scale, high-quality dataset for object detection. We use this dataset to conduct the modulated pre-training of our MQ-Det models... LVIS benchmark [13] is a challenging dataset for long-tail objects... ODin W benchmark [23] (Object Detection in the Wild) is a more challenging benchmark for evaluating model performance under real-world scenarios.
Dataset Splits	Yes	We report on Mini Val containing 5,000 images introduced in MDETR [20] as well as the full validation set v1.0. and During finetuning-free evaluation, we extract 5 instances as vision queries for each category from the downstream training set without any finetuning.
Hardware Specification	Yes	We conduct modulated pre-training of our models on the Objects365 dataset [36] for only one epoch using 8 NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions using components like BERT [8] and CLIP [32] but does not provide a specific list of software dependencies with version numbers (e.g., programming language versions, library versions, or specific framework versions) necessary for replication.
Experiment Setup	Yes	We report the hyper-parameter settings of the modulated pre-training of MQ-Det in Tab. VI. Other settings are the same with corresponding language-queried detectors. (Table VI: Item: lr of GCP, Value: 1e-5; Item: mask rate, Value: 40%)