CLIM: Contrastive Language-Image Mosaic for Region Representation
Authors: Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Wentao Liu, Chen Change Loy
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that CLIM improves different baseline open-vocabulary object detectors by a large margin on both OV-COCO and OV-LVIS benchmarks. |
| Researcher Affiliation | Collaboration | 1S-Lab, Nanyang Technological University 2The Chinese University of Hong Kong 3The University of Hong Kong 4Sense Time Research and Tetras.AI 5Shanghai AI Laboratory |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/wusize/CLIM. |
| Open Datasets | Yes | We follow OV-RCNN (Zareian et al. 2021) to divide COCO dataset (Lin et al. 2014) into 48 base classes and 17 novel classes. |
| Dataset Splits | No | The paper states 'The training set contains 107,761 images of base category annotations, and the test set contains 4,836 images', but does not explicitly mention a separate validation split or its size. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions software components like Faster RCNN, Center Net2, CLIP models, and Adam W optimizer, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For Detic (Zhou et al. 2022), we use the Faster RCNN with Res Net C4 (Ren et al. 2015) backbone as the detector on OV-COCO benchmark, and use the detector based on Center Net2 (Zhou, Koltun, and Kr ahenb uhl 2021) on OV-LVIS benchmark. For the experiment on OV-COCO, we train the CLIP model on COCO Caption (Chen et al. 2015) for 100 epochs. ... we use Adam W optimizer and set the batch size to 128 and the learning rate to 1e-5. |