Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Dual-calibrated Co-training Framework for Personalized Federated Semi-Supervised Medical Image Segmentation

Authors: Delin Pan, Jiansong Fan, Jie Zhu, Llihua Li, Xiang Pan

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show the effectiveness of our method on a private medical dataset and two public medical datasets.
Researcher Affiliation	Academia	1School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China 2Institute of Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou, China 3The PRC Ministry of Education Engineering Research Center of Intelligent Technology for Healthcare, Wuxi, Jiangsu 214122, China EMAIL, EMAIL, EMAIL EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/Medical-AI-Lab-of-JNU/PFSSL
Open Datasets	Yes	The Liver Seg23 private dataset, sourced from six hospitals in China... The ISIC-2018 (Codella et al. 2019) dataset for the skin cancer lesion task... The polyp segmentation task uses colonoscopy images from five different data sources for the experiment (Jha et al. 2020; Bernal et al. 2015).
Dataset Splits	Yes	In our studies, each sub-dataset was treated as an individual client s private dataset and randomly split into 80% for training and 20% for testing.
Hardware Specification	No	The paper states 'We implemented our framework using the Pytorch library on Linux system' but does not provide specific hardware details like GPU/CPU models, processor types, or memory amounts.
Software Dependencies	No	The paper mentions 'Pytorch library' and 'Linux system' but does not specify version numbers for these or any other software dependencies needed to replicate the experiment.
Experiment Setup	Yes	We employed U-Net (Ronneberger, Fischer, and Brox 2015) as the base model for each client, with a pre-trained Res Net-34 (He et al. 2016) as the backbone network. All training images were resized to 256 x 256. Each local model is trained via an Adam W optimizer with a batch size of 8, Adam momentums of 0.9 and 0.999, and a fixed learning rate uniformly as 1e-4. We set the federated hyperparameters ρ = 0.8, and µ = 0.1. In the semi-supervised setting, the labeled data ratio factor is set to 0.3, the patch size is set to 8 and the patch replacement number hyperparameter η is set to 1. We trained 20 federated rounds in total or until the model has converged stably, where the local epoch is set as 10 by default.