CoDet: Co-occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Authors: Chuofan Ma, Yi Jiang, Xin Wen, Zehuan Yuan, Xiaojuan Qi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments 4.1 Benchmark Setup 4.2 Implementation Details 4.3 Benchmark Results 4.4 Transfer to Other Datasets 4.5 Visualization and Analysis 4.6 Ablation Study |
| Researcher Affiliation | Collaboration | 1The University of Hong Kong 2Byte Dance Inc. |
| Pseudocode | No | The paper describes its methods using text and equations, but does not provide a formal pseudocode block or algorithm. |
| Open Source Code | Yes | Code is available at https://github.com/CVMI-Lab/Co Det. |
| Open Datasets | Yes | OV-LVIS is a general benchmark for open-vocabulary object detection, built upon LVIS [19] dataset which contains a diverse set of 1203 categories of objects with a long-tail distribution. Besides, we choose CC3M [40] which contains 2.8 million free-from image-text pairs crawled from the web, as the source of image-text pairs. OV-COCO is derived from the popular COCO [29] benchmark... we use COCO Caption [7] training set which provides 5 human-generated captions for each image for experiments on OV-COCO. |
| Dataset Splits | No | The paper mentions using a "COCO validation set" in Section 4.5, but it does not specify explicit splits (e.g., percentages or counts) for training, validation, or test sets. It refers to 'standard practice' for LVIS and category splits, but not dataset splits. |
| Hardware Specification | No | The paper mentions training on "8 GPUs" but does not specify any particular GPU model (e.g., NVIDIA A100, RTX series), CPU model, or other specific hardware configurations. |
| Software Dependencies | No | The paper mentions using frameworks and models like "Center Net2", "Res Net50", "Faster R-CNN", "Mask-RCNN", and "CLIP", but it does not provide specific version numbers for these or for any programming languages or libraries (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | Table 6: Hyper-parameter configuration of Co Det. Configuration OV-LVIS OV-COCO Optimizer Adam W SGD Learning rate (LR) 2e-4 2e-2 Total iterations 90k 90k Warmup iterations 1k Step decay factor 0.1 Step decay schedule [60k, 80k] Batch size (detection) 8 2 Batch size (caption) 32 8 Detection/Caption data ratio 1:4 1:4 Lregion-word weight 0.2 0.1 Limage-text weight 0.2 0.1 |