CLIM: Contrastive Language-Image Mosaic for Region Representation

Authors: Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Wentao Liu, Chen Change Loy

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that CLIM improves different baseline open-vocabulary object detectors by a large margin on both OV-COCO and OV-LVIS benchmarks.
Researcher Affiliation Collaboration 1S-Lab, Nanyang Technological University 2The Chinese University of Hong Kong 3The University of Hong Kong 4Sense Time Research and Tetras.AI 5Shanghai AI Laboratory
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/wusize/CLIM.
Open Datasets Yes We follow OV-RCNN (Zareian et al. 2021) to divide COCO dataset (Lin et al. 2014) into 48 base classes and 17 novel classes.
Dataset Splits No The paper states 'The training set contains 107,761 images of base category annotations, and the test set contains 4,836 images', but does not explicitly mention a separate validation split or its size.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions software components like Faster RCNN, Center Net2, CLIP models, and Adam W optimizer, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes For Detic (Zhou et al. 2022), we use the Faster RCNN with Res Net C4 (Ren et al. 2015) backbone as the detector on OV-COCO benchmark, and use the detector based on Center Net2 (Zhou, Koltun, and Kr ahenb uhl 2021) on OV-LVIS benchmark. For the experiment on OV-COCO, we train the CLIP model on COCO Caption (Chen et al. 2015) for 100 epochs. ... we use Adam W optimizer and set the batch size to 128 and the learning rate to 1e-5.