Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

OVS Meets Continual Learning: Towards Sustainable Open-Vocabulary Segmentation

Authors: Dongjun Hwang, Yejin Kim, Minyoung Lee, Seong Joon Oh, Junsuk Choe

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we show that Con OVS consistently outperforms existing methods across pre-training, incremental, and zero-shot test datasets, effectively expanding the recognition capabilities of OVS models when data is collected sequentially. Code is available at: https://github.com/dongjunhwang/Con OVS
Researcher Affiliation	Academia	Dongjun Hwang1 Yejin Kim1 Minyoung Lee1 Seong Joon Oh2,3 Junsuk Choe1 1Sogang University 2University of Tübingen 3Tübingen AI Center
Pseudocode	Yes	Algorithm 1 Interpolation factor estimator
Open Source Code	Yes	Code is available at: https://github.com/dongjunhwang/Con OVS
Open Datasets	Yes	In Scenario 1 (S1), the model is pre-trained on COCO [26], incrementally trained on Cityscapes [7], and evaluated on ADE20K [57] as the zero-shot test set. In Scenario 2 (S2), the model is again pre-trained on COCO but incrementally trained on ADE20K, with Cityscapes used for zero-shot evaluation. In Scenario 3 (S3), the model is pre-trained on COCO and incrementally trained on both Cityscapes and ADE20K. For zero-shot evaluation, we use a diverse collection of datasets: LVIS [10], BDD100K [51], Mapillary Vistas [33], PC-59, PC-459 [31], PAS-20, PAS-21 [8], and A-847 [57].
Dataset Splits	Yes	In Scenario 1 (S1), the model is pre-trained on COCO [26], incrementally trained on Cityscapes [7], and evaluated on ADE20K [57] as the zero-shot test set. In Scenario 2 (S2), the model is again pre-trained on COCO but incrementally trained on ADE20K, with Cityscapes used for zero-shot evaluation. In Scenario 3 (S3), the model is pre-trained on COCO and incrementally trained on both Cityscapes and ADE20K.
Hardware Specification	Yes	All experiments are run on two NVIDIA A5000 GPUs.
Software Dependencies	No	The paper mentions applying the method to "fc-clip with ConvNeXt-L [27]" and "X-Decoder with Focal-L [50]", and refers to CLIP [36], but does not provide specific version numbers for software libraries, frameworks, or programming languages used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We apply our method to two OVS models: fc-clip with ConvNeXt-L [27] and X-Decoder with Focal-L [50]. During the pre-training phase, fc-clip trains only the decoder, while X-Decoder trains both the encoder and decoder. In the fine-tuning phase, both models train only the decoder. The temperature T in the softmax is set to 0.01, and log-likelihood is used to compute probabilities from the MVN distributions. All methods were trained with the same number of iterations to ensure a fair comparison, and detailed information on the training cost of each method is provided in Appendix D.1.