Learning Object-Language Alignments for Open-Vocabulary Object Detection
Authors: Chuang Lin, Peize Sun, Yi Jiang, Ping Luo, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan, Jianfei Cai
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on two benchmark datasets, COCO and LVIS, demonstrate our superior performance over the competing approaches on novel categories, e.g. achieving 32.0% m AP on COCO and 21.7% mask m AP on LVIS. |
| Researcher Affiliation | Collaboration | 1 Monash University 2 Byte Dance 3 The University of Hong Kong |
| Pseudocode | No | The paper describes the approach using textual explanations and mathematical formulations but does not include a distinct pseudocode or algorithm block. |
| Open Source Code | Yes | Code is available at: https://github.com/clin1223/VLDet. |
| Open Datasets | Yes | COCO and COCO Caption. Following open-vocabulary COCO setting (OV-COCO) (Zareian et al., 2021), the COCO-2017 dataset is manually divided into 48 base classes and 17 novel classes, which are proposed by the zero-shot object detection (Bansal et al., 2018). ... For images-text pairs data, we use COCO Caption (Chen et al., 2015) training set, which contains 5 human-generated captions for each image. |
| Dataset Splits | Yes | We keep 107,761 images with base class annotations as the training set and 4,836 images with base and novel class annotations as the validation set. |
| Hardware Specification | Yes | All the expriments are conducted on 8 NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions software components like Faster R-CNN, CLIP, and CenterNet2, but does not provide specific version numbers for these or other relevant software dependencies (e.g., programming language versions, specific library versions). |
| Experiment Setup | Yes | In each mini-batch, the ratio of base-class detection data and image-text pair data is 1:4. For the warmup, we increase the learning rate from 0 to 0.002 for the first 1000 iterations. The model is trained for 90,000 iterations using SGD optimizer with batch size 8 and the learning rate is scaled down by a factor of 10 at 60,000 and 80,000 iterations. |