Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
OVS Meets Continual Learning: Towards Sustainable Open-Vocabulary Segmentation
Authors: Dongjun Hwang, Yejin Kim, Minyoung Lee, Seong Joon Oh, Junsuk Choe
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we show that Con OVS consistently outperforms existing methods across pre-training, incremental, and zero-shot test datasets, effectively expanding the recognition capabilities of OVS models when data is collected sequentially. Code is available at: https://github.com/dongjunhwang/Con OVS |
| Researcher Affiliation | Academia | Dongjun Hwang1 Yejin Kim1 Minyoung Lee1 Seong Joon Oh2,3 Junsuk Choe1 1Sogang University 2University of Tübingen 3Tübingen AI Center |
| Pseudocode | Yes | Algorithm 1 Interpolation factor estimator |
| Open Source Code | Yes | Code is available at: https://github.com/dongjunhwang/Con OVS |
| Open Datasets | Yes | In Scenario 1 (S1), the model is pre-trained on COCO [26], incrementally trained on Cityscapes [7], and evaluated on ADE20K [57] as the zero-shot test set. In Scenario 2 (S2), the model is again pre-trained on COCO but incrementally trained on ADE20K, with Cityscapes used for zero-shot evaluation. In Scenario 3 (S3), the model is pre-trained on COCO and incrementally trained on both Cityscapes and ADE20K. For zero-shot evaluation, we use a diverse collection of datasets: LVIS [10], BDD100K [51], Mapillary Vistas [33], PC-59, PC-459 [31], PAS-20, PAS-21 [8], and A-847 [57]. |
| Dataset Splits | Yes | In Scenario 1 (S1), the model is pre-trained on COCO [26], incrementally trained on Cityscapes [7], and evaluated on ADE20K [57] as the zero-shot test set. In Scenario 2 (S2), the model is again pre-trained on COCO but incrementally trained on ADE20K, with Cityscapes used for zero-shot evaluation. In Scenario 3 (S3), the model is pre-trained on COCO and incrementally trained on both Cityscapes and ADE20K. |
| Hardware Specification | Yes | All experiments are run on two NVIDIA A5000 GPUs. |
| Software Dependencies | No | The paper mentions applying the method to "fc-clip with ConvNeXt-L [27]" and "X-Decoder with Focal-L [50]", and refers to CLIP [36], but does not provide specific version numbers for software libraries, frameworks, or programming languages used (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We apply our method to two OVS models: fc-clip with ConvNeXt-L [27] and X-Decoder with Focal-L [50]. During the pre-training phase, fc-clip trains only the decoder, while X-Decoder trains both the encoder and decoder. In the fine-tuning phase, both models train only the decoder. The temperature T in the softmax is set to 0.01, and log-likelihood is used to compute probabilities from the MVN distributions. All methods were trained with the same number of iterations to ensure a fair comparison, and detailed information on the training cost of each method is provided in Appendix D.1. |