Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

RWMS: Reliable Weighted Multi-Phase for Semi-supervised Segmentation

Authors: Wensi Liu, Xiao-Yu Tang, Chong Yang, Chunjie Yang

AAAI 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we demonstrate that our method performs remarkably well compared to baseline methods and substantially outperforms them, more than 3% on VOC and Cityscapes.
Researcher Affiliation	Academia	College of Control Science and Engineering, Zhejiang University EMAIL
Pseudocode	Yes	The overall process is in Algorithm 1.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets	Yes	Pascal VOC 2012 (Everingham et al. 2015) is an important standard semantic segmentation dataset, initially consisting of 1464 images for training and 1449 images for validation. In order to increase the number of training samples, previous researchers adopt a method of introducing relatively lower-quality annotations from the SBD dataset (Hariharan et al. 2011), forming an augmented training set with a total of 10582 images. Cityscapes (Cordts et al. 2016) is a genuine dataset of urban environments. The training and validation subsets consist of 2975 and 500 images, respectively.
Dataset Splits	Yes	Pascal VOC 2012 (Everingham et al. 2015) is an important standard semantic segmentation dataset, initially consisting of 1464 images for training and 1449 images for validation. In order to increase the number of training samples, previous researchers adopt a method of introducing relatively lower-quality annotations from the SBD dataset (Hariharan et al. 2011), forming an augmented training set with a total of 10582 images. Cityscapes (Cordts et al. 2016) is a genuine dataset of urban environments. The training and validation subsets consist of 2975 and 500 images, respectively.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup	Yes	We initialize the backbone network by using pre-trained weights on Image Net (Russakovsky et al. 2015). Training is performed using the SGD optimizer with the momentum of 0.9 and the weight decay of 0.0001. We employ a polynomial learning rate decay strategy, as used in previous works: (1 iter max iter)0.9 (Chen et al. 2021; Yang et al. 2022; Liu et al. 2022b). The training is conducted for 80 epochs on both datasets. Images are cropped to 321x321 on Pascal and 721x721 on Cityscapes. For labeled images, we apply weak data augmentation, including random scaling and random flipping. For unlabeled images, we additionally apply strong data augmentation, such as colorjitter, grayscale, and blur.