Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
LeadFL: Client Self-Defense against Model Poisoning in Federated Learning
Authors: Chaoyi Zhu, Stefanie Roos, Lydia Y. Chen
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation shows that Lead FL is able to mitigate bursty adversarial patterns for both iid and non-iid data distributions. It frequently reduces the backdoor accuracy from more than 75% for state-of-the-art defenses to less than 10% while its impact on the main task accuracy is always less than for other client-side defenses. In this section, we demonstrate the effectiveness of Lead FL for multiple server-side defenses. We consider heterogeneous data distributions and compare against state-of-the-art client-side defense mechanisms. |
| Researcher Affiliation | Academia | Chaoyi Zhu 1 Stefanie Roos 1 Lydia Y. Chen 1 1Delft University of Technology, Delft, Netherlands. Correspondence to: Lydia Y. Chen <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Lead FL and robust aggregation |
| Open Source Code | Yes | Our code can be found at https://github.com/Carlos Chu-c/Lead FL. |
| Open Datasets | Yes | We conduct experiments on Fashion MNIST, CIFAR10 and CIFAR100, which are both benchmark tasks in image classification. |
| Dataset Splits | No | The paper does not explicitly state validation set splits. It mentions training and testing, but not a separate validation set with specific proportions or methods. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments. |
| Software Dependencies | No | We perform all experiments using Py Torch s deep learning framework (Paszke et al., 2019) in combination with the FLTK Testbed. |
| Experiment Setup | Yes | For all datasets, we choose the learning rate η = 0.01 and batch size Ba = 32 for all clients. The model architectures for two datasets are shown in Table 4. For our defense, we set the clipping norm q = 0.2. For the regularization term, we use hyperparameter tuning to choose α = 0.4 for Fashion MNIST, α = 0.25 for CIFAR10, and α = 0.15 for CIFAR100. |