Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Medical Manifestation-Aware De-Identification

Authors: Yuan Tian, Shuo Wang, Guangtao Zhai

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we release Me Ma, consisting of over 40,000 photo-realistic patient faces. Me Ma is re-generated from massive real patient photos. By carefully modulating the generation and data-filtering procedures, Me Ma avoids breaching real patient privacy, while ensuring rich and plausible medical manifestations. We recruit expert clinicians to annotate Me Ma with both coarse and fine-grained labels, building the first medical-scene De ID benchmark. Additionally, we propose a baseline approach for this new medical-aware De ID task, by integrating data-driven medical semantic priors into the De ID procedure. Despite its conciseness and simplicity, our approach substantially outperforms previous ones. [...] Experiments Datasets. Me Ma: the proposed Me Ma dataset consists of 42,307 images in total, which is split into a training set (34,000 images), a hyper-parameter selection set (3,729 images), and a validation set (4,578 images). All images are labeled with the disease category. Me Ma-Seg: for the BCC (basal cell carcinoma) disease type, we randomly select 600 images from the training set and 150 images from the validation set of Me Ma, annotating the tumor masks for these images.
Researcher Affiliation	Academia	Yuan Tian1, Shuo Wang2, Guangtao Zhai2* 1 Shanghai AI Laboratory 2 Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University EMAIL, EMAIL
Pseudocode	No	The paper describes the proposed baseline approach (Med Sem-De ID) and its components (Medical Semantics Encoding, Medical Semantics-Preserved De ID) using descriptive text and a high-level diagram (Figure 5), but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Dataset and Code https://github.com/tianyuan168326/Me Ma-Pytorch
Open Datasets	Yes	In this paper, we release Me Ma, consisting of over 40,000 photo-realistic patient faces. Me Ma is re-generated from massive real patient photos. [...] Dataset and Code https://github.com/tianyuan168326/Me Ma-Pytorch
Dataset Splits	Yes	Me Ma: the proposed Me Ma dataset consists of 42,307 images in total, which is split into a training set (34,000 images), a hyper-parameter selection set (3,729 images), and a validation set (4,578 images). All images are labeled with the disease category. Me Ma-Seg: for the BCC (basal cell carcinoma) disease type, we randomly select 600 images from the training set and 150 images from the validation set of Me Ma, annotating the tumor masks for these images.
Hardware Specification	Yes	It takes about five days to train the model on a machine equipped with two Nvidia A6000 GPUs. [...] Training takes approximately two days on a machine equipped with four Nvidia 4090 GPUs.
Software Dependencies	No	The paper mentions specific models like 'Stable Diffusion v1-5' and optimization techniques like 'Adam optimizer' and 'Lo RA', but does not provide version numbers for general software dependencies such as Python, PyTorch, or CUDA, which are typically needed for reproducibility.
Experiment Setup	Yes	For training the patient face generator model, we fine-tune Stable Diffusion v1-5 (Rombach et al. 2022) using the low-rank adaptation (Lo RA) (Hu et al. 2021) technique, with the real patient data. The rank number is set to 64. We use the Adam optimizer (Kingma 2014) with β1 = 0.9 and β2 = 0.99. The learning rate starts at 1 10 4 and follows a cosine decay schedule. The batch size is 32, and the model is trained for ten epochs. [...] For training the Med Sem-De ID model, we use the Adam optimizer with β1 = 0.5 and β2 = 0.99. The initial learning rate is 2 10 4 and is halved after 150,000 iterations. The total iteration number is 300,000. The batch size is 16. [...] We set λmed = 5 to achieve the best trade-off between medical accuracy and De ID. [...] We set λrev = 0.1 to achieve optimal results.