Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Label-Free Backdoor Attacks in Vertical Federated Learning

Authors: Wei Shen, Wenke Huang, Guancheng Wan, Mang Ye

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of our method in various settings. ... We conduct extensive experiments to demonstrate that our proposed method Label-Free Backdoor Attacks (LFBA), is effective to perform backdoor attacks in various settings, without additional knowledge for task labels.
Researcher Affiliation	Academia	National Engineering Research Center for Multimedia Software School of Computer Science, Wuhan University, China EMAIL
Pseudocode	Yes	Algorithm 1: The framework of LFBA in VFL
Open Source Code	Yes	Code https://github.com/shentt67/LFBA/
Open Datasets	Yes	We evaluate our method on four real-world datasets, with data distributed to multiple clients, and only the active client holds the task labels: (1) NUS-WIDE (Chua et al. 2009): A multi-modal dataset contains 1000 text features and 634 image features, labeled with multiple classes. (2) UCIHAR (Anguita et al. 2013): A human activity recognition dataset with six classes: walking , walking upstairs , walking downstairs , sitting , standing , and laying , with 7352 training samples and 2947 testing samples. (3) Phishing (Asuncion, Newman et al. 2007): It provides 30 features indicating whether a website is a phishing website, with 8844 training samples and 2211 test samples. (4) CIFAR-10 (Krizhevsky, Hinton et al. 2009): It is an image dataset for 10 classification tasks with 50000 training samples and 10000 testing samples.
Dataset Splits	Yes	NUS-WIDE... with 69966 training samples and 46693 testing samples. UCIHAR... with 7352 training samples and 2947 testing samples. Phishing... with 8844 training samples and 2211 test samples. CIFAR-10... with 50000 training samples and 10000 testing samples.
Hardware Specification	No	The numerical calculations in this paper have been supported by the super-computing system in the Supercomputing Center of Wuhan University.
Software Dependencies	No	The paper mentions using the Adam optimizer, but does not provide specific version numbers for any software libraries or dependencies like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	All models are trained until convergence using the Adam optimizer (Kingma and Ba 2015) with a batch size of 256. The learning rate of all models is set to 0.001 for the NUS-WIDE and CIFAR-10 datasets, and 0.003 for the UCI-HAR and Phishing datasets. The poison sample ratio p = Np N is set between 0.1 and 0.3, and the switching sample ratio s = Ns Np is set between 0 and 1 (e.g., p = 0.1 and s = 0.3 for the NUS-WIDE dataset).