Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ScaleMatch: Multi-scale Consistency Enhancement for Semi-supervised Semantic Segmentation

Authors: Liang Lv, Lefei Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Consequently, our Scale Match enhances the model s generalization under scale variation, outperforming existing state-of-the-art methods on both the Pascal VOC and Cityscapes datasets under various partition protocols. Code https://github.com/lvliang6879/Scale Match...Experiments Datasets. We conduct experiments on two widely-used datasets, Pascal VOC 2012 and Cityscapes...Ablation Studies. We conduct ablation studies to verify the proposed strategies in Scale Match, reporting results for Deep Lab V3+ with Res Net-101 on the Pascal VOC original dataset (513 513 training size).
Researcher Affiliation	Academia	National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University EMAIL
Pseudocode	No	The paper describes the methodology in text and uses diagrams (Figure 2 and 3) but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/lvliang6879/Scale Match
Open Datasets	Yes	Experiments Datasets. We conduct experiments on two widely-used datasets, Pascal VOC 2012 and Cityscapes. Pascal VOC 2012 is a SS benchmark (Everingham et al. 2015), consisting of 1,464 high-quality annotated images for training and 1,449 images for evaluation... Cityscapes is designed for semantic analysis of urban street scenes and includes 2,975 high-resolution images for training and 500 images for validation
Dataset Splits	Yes	We conduct experiments on two widely-used datasets, Pascal VOC 2012 and Cityscapes. Pascal VOC 2012...consisting of 1,464 high-quality annotated images for training and 1,449 images for evaluation. Additionally, we perform experiments on the augmented Pascal VOC 2012 dataset...totaling 10,582 training images. Cityscapes...includes 2,975 high-resolution images for training and 500 images for validation, primarily covering 19 categories within urban environments. Following prior research (Yang et al. 2023), we evaluate both datasets using various label partitions.
Hardware Specification	Yes	In all experiments, we implement our proposed method using the Py Torch framework and perform computations on four NVIDIA RTX 4090 GPUs (24GB VRAM each).
Software Dependencies	No	In all experiments, we implement our proposed method using the Py Torch framework. No specific version number for PyTorch or other software dependencies are provided.
Experiment Setup	Yes	Implementation Details. Consistent with previous research (Sun et al. 2024a), we employ Deep Lab V3+ (Chen et al. 2018) as our network architecture and use a Res Net-101 pretrained on Image Net as the backbone. For the Pascal dataset, we use an SGD optimizer with an initial learning rate of 0.001, weight decay of 1e-4, and crop sizes of either 321 321 or 513 513, with batch sizes of 16 and 8, respectively, over a total of 80 training epochs, and a confidence threshold τ of 0.95. For the Cityscapes dataset, we also use an Adamw optimizer with an initial learning rate of 0.00005, weight decay of 1e-2, a crop size of 801 801, and a batch size of 4, over a total of 240 training epochs with a confidence threshold τ of 0. The scale factors V are [0.25, 0.5, 1.5, 2.0], T are [0.75, 1.0, 1.25], and the weights for each loss function λ1, λ2, λ3 are set to 0.25, 0.25, and 0.5, respectively.