Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation
Authors: Yun Xing, Jian Kang, Aoran Xiao, Jiahao Nie, Ling Shao, Shijian Lu
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments over a broad suite of 8 segmentation benchmarks show that Co Cu achieves superb zeroshot transfer performance and greatly boosts language-supervised segmentation baseline by a large margin, suggesting the value of bridging semantic gap in pretraining data. |
| Researcher Affiliation | Collaboration | Yun Xing1 Jian Kang1 Aoran Xiao1 Jiahao Nie1 Ling Shao2 Shijian Lu1 1 Nanyang Technological University 2 UCAS-Terminus AI Lab, UCAS, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/xing0047/rewrite. |
| Open Datasets | Yes | We follow the prior study [43] and conduct pre-training on three publicly available image-text datasets: CC3M (C3) [36], CC12M (C12) [8], YFCC14M (Y14) [39]. |
| Dataset Splits | Yes | We benchmark zero-shot transfer performance of Co Cu on the validation splits of eight different datasets that cover a myriad of scenes and category sets, including Pascal VOC [15], Pascal Context [30], COCO [27], Image Net-S-50, Image Net-S-300 [17], COCO Stuff [5], Cityscapes [12], and ADE20K [50]. |
| Hardware Specification | Yes | We set the global batch size for contrastive learning as 1,024 and use 4 Tesla V100 GPUs to carry out pre-training for all experiments. |
| Software Dependencies | Yes | For efficient semantic searching, we build indexing systems using autofaiss 3. |
| Experiment Setup | Yes | We set the global batch size for contrastive learning as 1,024 and use 4 Tesla V100 GPUs to carry out pre-training for all experiments. Consistent with [43], we set the initial learning rate to 0.0016. The pre-training undergoes 30 epochs, with a linear warmup for the first 2 epochs and a cosine schedule for the remaining epochs. L is set to 3. |