Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Leveraging Depth and Language for Open-Vocabulary Domain-Generalized Semantic Segmentation

Authors: Siyu Chen, Ting Han, Chengzheng Fu, Changshe Zhang, Chaolei Wang, Jinhe Su, Guorong Cai, Meiliu Wu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive evaluation on these components demonstrates the effectiveness of our designs. Our proposed Vireo achieves the state-of-the-art performance and surpasses existing methods by a large margin in both domain generalization and open-vocabulary recognition, offering a unified and scalable solution for robust visual understanding in diverse and dynamic environments. ... 4 Experiments 4.1 Datasets & Evaluation Protocols 4.3 Performance Comparison 4.4 Ablation Study
Researcher Affiliation	Academia	Siyu Chen1,2 Ting Han2,3 * Chengzheng Fu4 Changshe Zhang5 Chaolei Wang3 Jinhe Su1 * Guorong Cai1 Meiliu Wu2 * 1 Jimei University, 2 University of Glasgow, 3 Sun Yat-sen University, 4 Nanjing University of Aeronautics and Astronautics, 5 Xidian University,
Pseudocode	No	The paper describes the methodology in Section 3 and uses architectural diagrams (Figure 2) to illustrate the framework, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/SYCh/Vireo.
Open Datasets	Yes	We evaluate Vireo on six real-world datasets (Cityscapes [38], BDD100K [39], Mapillary [40], ACDC [41], ADE150 [42], and ADE847 [42]) and two synthetic datasets (GTA5 [43] and DELIVER [44]).
Dataset Splits	Yes	Cityscapes (City.) is an autonomous-driving dataset with 2,975 training images and 500 validation images, each at a resolution of 2048 1024. ... Following the existing DGSS evaluation protocol, we train on one dataset as the source domain and validate on multiple unseen target domains. The three standard evaluation setups are: (1) Cityscapes ACDC; (2) GTA5 Cityscapes, BDD100K, Mapillary; (3) Cityscapes BDD100K, Mapillary, GTA5.
Hardware Specification	Yes	All experiments are conducted on an NVIDIA RTX A6000 GPU with a batch size of 8, taking approximately 14 hours to train and peaking at around 45 GB of GPU memory usage.
Software Dependencies	No	Our implementation is built upon the MMSegmentation [45] codebase.
Experiment Setup	Yes	We employ the Adam W optimizer with an initial learning rate of 1e-4, a weight decay of 0.05, epsilon set to 1e-8, and beta parameters of (0.9, 0.999). The total number of training iterations is 40,000, matching REIN, and we adopt a polynomial learning-rate decay schedule that reduces the learning rate to zero over 40,000 iterations with a decay power of 0.9 and no epoch-based warmup. Data augmentation comprises multi-scale resizing, random cropping (with fixed crop size and category-ratio constraint), random horizontal flipping, and photometric distortion.