Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation
Authors: Yun Xing, Jian Kang, Aoran Xiao, Jiahao Nie, Ling Shao, Shijian Lu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments over a broad suite of 8 segmentation benchmarks show that Co Cu achieves superb zeroshot transfer performance and greatly boosts language-supervised segmentation baseline by a large margin, suggesting the value of bridging semantic gap in pretraining data. |
| Researcher Affiliation | Collaboration | Yun Xing1 Jian Kang1 Aoran Xiao1 Jiahao Nie1 Ling Shao2 Shijian Lu1 1 Nanyang Technological University 2 UCAS-Terminus AI Lab, UCAS, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/xing0047/rewrite. |
| Open Datasets | Yes | We follow the prior study [43] and conduct pre-training on three publicly available image-text datasets: CC3M (C3) [36], CC12M (C12) [8], YFCC14M (Y14) [39]. |
| Dataset Splits | Yes | We benchmark zero-shot transfer performance of Co Cu on the validation splits of eight different datasets that cover a myriad of scenes and category sets, including Pascal VOC [15], Pascal Context [30], COCO [27], Image Net-S-50, Image Net-S-300 [17], COCO Stuff [5], Cityscapes [12], and ADE20K [50]. |
| Hardware Specification | Yes | We set the global batch size for contrastive learning as 1,024 and use 4 Tesla V100 GPUs to carry out pre-training for all experiments. |
| Software Dependencies | Yes | For efficient semantic searching, we build indexing systems using autofaiss 3. |
| Experiment Setup | Yes | We set the global batch size for contrastive learning as 1,024 and use 4 Tesla V100 GPUs to carry out pre-training for all experiments. Consistent with [43], we set the initial learning rate to 0.0016. The pre-training undergoes 30 epochs, with a linear warmup for the first 2 epochs and a cosine schedule for the remaining epochs. L is set to 3. |