Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation

Authors: Yun Xing, Jian Kang, Aoran Xiao, Jiahao Nie, Ling Shao, Shijian Lu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments over a broad suite of 8 segmentation benchmarks show that Co Cu achieves superb zeroshot transfer performance and greatly boosts language-supervised segmentation baseline by a large margin, suggesting the value of bridging semantic gap in pretraining data.
Researcher Affiliation Collaboration Yun Xing1 Jian Kang1 Aoran Xiao1 Jiahao Nie1 Ling Shao2 Shijian Lu1 1 Nanyang Technological University 2 UCAS-Terminus AI Lab, UCAS, China
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/xing0047/rewrite.
Open Datasets Yes We follow the prior study [43] and conduct pre-training on three publicly available image-text datasets: CC3M (C3) [36], CC12M (C12) [8], YFCC14M (Y14) [39].
Dataset Splits Yes We benchmark zero-shot transfer performance of Co Cu on the validation splits of eight different datasets that cover a myriad of scenes and category sets, including Pascal VOC [15], Pascal Context [30], COCO [27], Image Net-S-50, Image Net-S-300 [17], COCO Stuff [5], Cityscapes [12], and ADE20K [50].
Hardware Specification Yes We set the global batch size for contrastive learning as 1,024 and use 4 Tesla V100 GPUs to carry out pre-training for all experiments.
Software Dependencies Yes For efficient semantic searching, we build indexing systems using autofaiss 3.
Experiment Setup Yes We set the global batch size for contrastive learning as 1,024 and use 4 Tesla V100 GPUs to carry out pre-training for all experiments. Consistent with [43], we set the initial learning rate to 0.0016. The pre-training undergoes 30 epochs, with a linear warmup for the first 2 epochs and a cosine schedule for the remaining epochs. L is set to 3.