LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors

Authors: Sheng Jin, Xueying Jiang, Jiaxing Huang, Lewei Lu, Shijian Lu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments over multiple large-scale benchmarks show that DVDet outperforms the state-of-the-art consistently by large margins.
Researcher Affiliation Collaboration Sheng Jin1, Xueying Jiang1, Jiaxing Huang1, Lewei Lu2, Shijian Lu1 1 S-Lab, Nanyang Technological University 2 Sense Time Research
Pseudocode No The paper describes the methods in prose and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link indicating the release of source code for the described methodology.
Open Datasets Yes We evaluated DVDet over two widely adopted benchmarks, ie, COCO (Lin et al., 2014) and LVIS (Gupta et al., 2019).
Dataset Splits Yes For the COCO dataset, we follow OV-RCNN (Zareian et al., 2021) to split the object categories into 48 base categories and 17 novel categories. As in (Zareian et al., 2021), we keep 107,761 images with base class annotations as the training set and 4,836 images with base and novel class annotations as the validation set.
Hardware Specification Yes All expriments are conducted on 4 NVIDIA V100 GPUs.
Software Dependencies No The paper mentions using a 'CLIP text encoder' but does not specify version numbers for any software libraries, frameworks, or dependencies used in the experiments.
Experiment Setup Yes For the warmup, we increase the learning rate from 0 to 0.002 for the first 1000 iterations. The model is trained for 5,000 iterations using SGD optimizer with batch size 8 and the learning rate is scaled down by a factor of 10 at 6000 and 8000 iterations. ... For the warmup, we increase the learning rate from 0 to 2e-4 for the first 1000 iterations. The model is trained for 10,000 iterations using Adam optimizer with batch size 8.