Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Segment Anything Model Meets Semi-supervised Medical Image Segmentation: A Novel Perspective

Authors: Haifeng Zhao, Haiyang Li, Lei-Lei Ma, Dengdi Sun

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments We validate our proposed method on three widely-used semi-supervised medical image segmentation datasets: the LA dataset [44], the Brats-2019 dataset [45], and the PROMISE12 dataset [46]. Additionally, to facilitate a comprehensive comparison with existing prompt-free medical SAM variants, we conduct experiments on the Synapse Multi-Organ CT dataset [47].
Researcher Affiliation	Academia	1Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, China 2Key Lab Intelligent Comp & Signal Proc, Minist Educ, Anhui University, China 3School of Artificial Intelligence, Anhui University, China
Pseudocode	No	The paper describes the methodology using textual explanations and diagrams (e.g., Figure 1, 2, 3), but does not contain a formal pseudocode or algorithm block.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We are currently not providing open access to the data and code.
Open Datasets	Yes	We validate our proposed method on three widely-used semi-supervised medical image segmentation datasets: the LA dataset [44], the Brats-2019 dataset [45], and the PROMISE12 dataset [46]. Additionally, to facilitate a comprehensive comparison with existing prompt-free medical SAM variants, we conduct experiments on the Synapse Multi-Organ CT dataset [47].
Dataset Splits	Yes	Table 1: Comparisons with SOTA semi-supervised segmentation methods on the LA dataset. Method Scans used Metrics Scans used Metrics Labeled Unlabeled DSC Jaccard 95HD ASD Labeled Unlabeled DSC Jaccard 95HD ASD UA-MT [29] 4(5%) 76(95%) 82.26 70.98 13.71 3.82 8(10%) 72(90%) 87.79 78.39 8.68 2.12 ... Table 2: Comparisons with SOTA semi-supervised segmentation methods on the Brats-2019 dataset. Method Scans used Metrics Scans used Metrics Labeled Unlabeled DSC Jaccard 95HD ASD Labeled Unlabeled DSC Jaccard 95HD ASD DAN [53] 25(10%) 225(90%) 81.71 71.43 15.15 2.32 50(20%) 200(80%) 83.31 73.53 10.86 2.23 ...
Hardware Specification	Yes	Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: Sufficient information on the computer resources can be seen in the supplementary material.
Software Dependencies	No	The main paper does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) needed to replicate the experiments.
Experiment Setup	Yes	Lft(xℓ i, yℓ i) = Lsup(pℓ i, yℓ i) , Lsup(pℓ i, yℓ i) = λdice Ldice(pℓ i, yℓ i) + λce Lce(pℓ i, yℓ i) , (3) where pℓ i = fΘ(xℓ i) denotes the model prediction for a labeled image, and yℓ i is the corresponding ground-truth mask. Ldice and Lce represent the Dice loss and Cross-entropy loss, respectively, with λdice and λce controlling their relative contributions. ... At each epoch t, we update the distillation weight λkd through conditional decay: ( α λ(t 1) kd , LS sup < LT sup, λ(t 1) kd , otherwise, (7) where α is a scaling factor (e.g., 0.95) for gradual decay and λ(0) kd = 1.0 initiates strong guidance.