Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Defending Against Sophisticated Poisoning Attacks with RL-based Aggregation in Federated Learning

Authors: Yujing Wang, Hainan Zhang, Sijia Wen, Wangjie Qiu, Binghui Guo

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on four real-world datasets demonstrate that the proposed defense model significantly outperforms widely adopted defense models for sophisticated attacks.
Researcher Affiliation	Academia	1Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing 2 School of Artificial Intelligence, Beihang University, China EMAIL
Pseudocode	Yes	In Ada Agg RL framework (see Algorithm in Appendix), the server determines the weights for local model aggregation by assessing the stability of client data distributions, as shown in Figure 3.
Open Source Code	Yes	Code https://github.com/TAP-LLM/Ada Agg RL.
Open Datasets	Yes	We conduct experiments on four datasets: MNIST (Le Cun et al. 1998), F-MNIST (Xiao, Rasul, and Vollgraf 2017), EMNIST (Cohen et al. 2017), and Cifar10 (Krizhevsky, Hinton et al. 2009).
Dataset Splits	Yes	Addressing the non-i.i.d. challenge in FL, we follow the approach from prior work (Fang et al. 2020) by distributing training examples across all clients. Given an M-class dataset, clients are randomly divided into M groups. The probability q of assigning a training sample with label l to its respective group is set, with the probability of assigning it to other groups being 1 q M 1. Training samples within the same client group adhere to the same distribution. When q = 1/M, the distribution of training samples across M groups is uniform, ensuring that all clients datasets follow the same distribution. In cases where q > 1/M, the datasets among clients are not identically distributed. Using MNIST dataset, we set q = 0.5 to distribute training samples among clients unevenly, denoted as MNIST-0.5. MNIST-0.1 represents the scenario where MNIST is evenly distributed among clients (q = 0.1).
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) were mentioned for the experiments.
Software Dependencies	No	The paper mentions using the TD3 algorithm and pre-trained CNNs but does not provide specific version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup	Yes	In FL, there are 100 clients, denoted as K = 100, with 20 malicious clients. For defense strategies based on RL, given the continuous action and state spaces, we select Twin Delayed DDPG (TD3) (Fujimoto, Hoof, and Meger 2018) algorithm to train the defense policies in experiments. Details on parameter determination are provided in the Appendix. For MNIST, F-MNIST, and EMNIST, a Convolutional Neural Network (CNN) serves as the global model. In the case of Cifar10, the Res Net18 architecture (He et al. 2016) is utilized as the global model. gradient inversion optimization is restricted to 30 steps, with 16 dummy images. We assess FL defense methods by evaluating the global model s image classification accuracy after 500 epochs of FL training...