Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Robust Pseudo-Label Learning in Semantic Segmentation: An Encoding Perspective

Authors: Wangkai Li, Rui Sun, Zhaoyang Li, Tianzhu Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiment Our approach on two standard benchmarks for synthetic-to-real adaptation of street scenes in the UDA task. The synthetic datasets include GTAv [67] (24,966 images) and SYNTHIA [69] (9,400 images). Cityscapes [19], a real-world urban dataset, serves as the target domain, with 2,975 training and 500 validation images. For the SSL setting, we use Cityscapes, PASCAL VOC 2012 [25], a generic object segmentation benchmark with 1,464 training and 1,449 validation images, along with an augmented set of 10,582 additional training images, and COCO [52], a challenging benchmark composed of 118k/5k training/validation images with 81 classes.
Researcher Affiliation	Academia	Wangkai Li1, Rui Sun1, Zhaoyang Li1, Tianzhu Zhang1,2 1 University of Science and Technology of China 2National Key Laboratory of Deep Space Exploration, Deep Space Exploration Laboratory EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Reliable bit mining strategy 1: Input: probability vector pi [0, 1]K 2: Output: mask of the reliable part Mi {0, 1}K 3: Initialize: code matrix M, confidence threshold T, candidate set Sc = {}, Mi = {1}K 4: compute code distance for each class by Eq. 5; 5: sort the code distance and obtain sorted index I; 6: compute confidence qi; 7: for n = 1 to N do 8: add c I[n] to Sc; 9: compute the shared part Ps(Sc); 10: update Mi with bit positions in Ps(Sc); 11: compute mean confidence qi m in Ps(Sc); 12: if qi m > T or Mi = {0}K then 13: break; 14: end if 15: end for 16: return Mi
Open Source Code	No	Code is available at https://github.com/Woof6/ECOCSeg. AND Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The code will be open-sourced to the community upon acceptance of the paper.
Open Datasets	Yes	Datasets. We evaluate our approach on two standard benchmarks for synthetic-to-real adaptation of street scenes in the UDA task. The synthetic datasets include GTAv [67] (24,966 images) and SYNTHIA [69] (9,400 images). Cityscapes [19], a real-world urban dataset, serves as the target domain, with 2,975 training and 500 validation images. For the SSL setting, we use Cityscapes, PASCAL VOC 2012 [25], a generic object segmentation benchmark with 1,464 training and 1,449 validation images, along with an augmented set of 10,582 additional training images, and COCO [52], a challenging benchmark composed of 118k/5k training/validation images with 81 classes.
Dataset Splits	Yes	Cityscapes [19], a real-world urban dataset, serves as the target domain, with 2,975 training and 500 validation images. For the SSL setting, we use Cityscapes, PASCAL VOC 2012 [25], a generic object segmentation benchmark with 1,464 training and 1,449 validation images, along with an augmented set of 10,582 additional training images, and COCO [52], a challenging benchmark composed of 118k/5k training/validation images with 81 classes.
Hardware Specification	Yes	Experiments are conducted on one RTX-3090 GPU for DACS and DAFormer, and two for MIC.
Software Dependencies	No	The network is trained for 40K iterations (batch size 2) using Adam W optimizer with learning rates of 6 10 5 (encoder) and 6 10 4 (decoder), weight decay of 0.01, and linear warm-up for the first 1.5K iterations.
Experiment Setup	Yes	UDA Setting. We evaluate ECOCSeg on three widely used frameworks, DACS [79] with Res Net101 [29] backbone, DAFormer [34], and MIC [35], with MIT-B5 [91] backbone. Experiments are conducted on one RTX-3090 GPU for DACS and DAFormer, and two for MIC. The network is trained for 40K iterations (batch size 2) using Adam W optimizer with learning rates of 6 10 5 (encoder) and 6 10 4 (decoder), weight decay of 0.01, and linear warm-up for the first 1.5K iterations. Images are rescaled and randomly cropped to 512 512 following DAFormer s augmentation, and the EMA coefficient for updating the teacher net is 0.999. SSL Setting. We implement our method on ST++ [95], Fix Match [72], Uni Match [93] and adopt Deep Labv3+ [10] with a Res Net [29] backbone as our segmentation model. For Pascal, we use a crop size of 321 321 and 513 513, a batch size of 8, and a learning rate of 0.001 with an SGD optimizer. The model is trained for 80 epochs using a poly learning rate scheduler on 2 RTX 3090 GPUs. More experiment settings are detailed in Appendix G. ECOCSeg Parameters. ECOCSeg uses Mtext as the default codebook with codeword length K = 40 for Cityscapes and Pascal, and K = 60 for COCO. We set the loss weight λ1 = 5 and λ2 = 2 with the temperature τ = 0.5. The confidence threshold T for reliable bit mining is set to 0.95.