Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Realistic Semi-supervised Medical Image Classification

Authors: Wenxue Li, Lie Ju, Feilong Tang, Peng Xia, Xinyu Xiong, Ming Hu, Lei Zhu, Zongyuan Ge

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on a variety of medical image datasets demonstrate the superior performance of our proposed method over state-of-the-art Closed-set and Openset SSL methods.
Researcher Affiliation	Academia	1 Monash University 2 The Hong Kong University of Science and Technology (Guangzhou) 3 UNC-Chapel Hill 4 Sun Yat-sen University
Pseudocode	No	The paper describes the methodology in text and mathematical formulas but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code, nor does it provide a link to a code repository.
Open Datasets	Yes	We validate our proposed method on diverse datasets comprising multiple modalities, including dermatology, ophthalmology, and endoscopy. Dermatology. We adopt ISIC-2019 (Combalia et al. 2022)... Ophthalmology. We utilize APTOS-2019 (Karthick and Sohier 2019)... To create a more challenging and realistic scenario, we incorporate samples from the i AMD-Challenge (Fang et al. 2022) dataset... Endoscopy. We employ Hyper Kvasir (Borgli et al. 2020).
Dataset Splits	Yes	We consider labeled ratio γ {5%, 10%, 20%} for ISIC-2019, γ {10%, 20%} for APTOS-2019, and γ {1%, 2%} for Hyper Kvasir. To construct training data, we sample γ (%) samples from each known class as the labeled dataset, while the remaining samples are used to form the unlabeled data set. We establish the balanced validation set and test set with known classes for each dataset to ensure fair evaluation of the learning performance for every category.
Hardware Specification	Yes	All the experiments are implemented on two NVIDIA RTX4090 GPUs.
Software Dependencies	No	The paper mentions using ResNet-50 as the backbone architecture and the Adam optimizer, but does not provide specific version numbers for software libraries or dependencies like PyTorch, TensorFlow, or Python.
Experiment Setup	Yes	We train the model for 20,000 iterations. To update prototypes, we train the model with only the supervised training manner for the first 200 iterations... We adopt the Adam optimizer with a batch size of 64. The hyper-parameter λ which controls the ratio of unlabeled data in each batch, is set to 3. The learning rate is set to 0.0001 and adjusted using the cosine decay strategy. The temperature hyper-parameter τ in Eq. 7 is set to 0.07 empirically.