Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Hot-pluggable Federated Learning: Bridging General and Personalized FL via Dynamic Selection

Authors: Lei Shen, Zhenheng Tang, Lijun Wu, Yonggang Zhang, Xiaowen Chu, Tao Qin, Bo Han

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our comprehensive experiments and ablation studies demonstrate that HPFL significantly outperforms state-of-the-art GFL and PFL algorithms. Additionally, we empirically show HPFL s remarkable potential to resolve other practical FL problems such as continual federated learning and discuss its possible applications in one-shot FL, anarchic FL, and FL plugin market. Our work is the first attempt towards improving GFL performance through a selecting mechanism with personalized plug-ins.
Researcher Affiliation	Collaboration	Lei Shen1, Zhenheng Tang2, Lijun Wu3 Yonggang Zhang1 Xiaowen Chu 2,4 Tao Qin5 Bo Han1, 1 TMLR Group, Department of Computer Science, Hong Kong Baptist University 2 CSE Department, The Hong Kong University of Science and Technology 3 Shanghai AI Laboratory 4 DSA Thrust, The Hong Kong University of Science and Technology (Guangzhou) 5 Microsoft Research AI4Science
Pseudocode	Yes	Algorithm 1 HPFL.
Open Source Code	Yes	Our code is released at https://github.com/tmlr-group/HPFL.
Open Datasets	Yes	We conduct experiments on four commonly used image classification datasets in FL, including CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), Fashion-MNIST (Xiao et al., 2017), and Tiny-Image Net (Le & Yang, 2015)
Dataset Splits	Yes	We conduct experiments on four commonly used image classification datasets in FL, including CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), Fashion-MNIST (Xiao et al., 2017), and Tiny-Image Net (Le & Yang, 2015), with Latent Dirichlet Sampling (Dir) partition method (α = 0.1, 0.05) to simulate data heterogeneity following (He et al., 2020b; Li et al., 2021b; Luo et al., 2021; Tang et al., 2022b). ... To make the training data and test data of a client have the same distribution following the settings of most PFL methods (Collins et al., 2021), we count the number of samples Strain(c, m) in each class c of training data of client m and split test data of that clients in that distribution...
Hardware Specification	Yes	We conduct experiments using NVIDIA A100 40GB GPU, AMD EPYC 7742 64-Core Processor Units. The operating system is Ubuntu 20.04.1 LTS. The pytorch version is 1.12.1. The numpy version is 1.23.2. The cuda version is 12.0.
Software Dependencies	Yes	The pytorch version is 1.12.1. The numpy version is 1.23.2. The cuda version is 12.0.
Experiment Setup	Yes	We use SGD without momentum as the optimizer for all experiments, with a batch size of 128 and weight decay of 0.0001. The learning rate is set as 0.1 for both the training of the global model and the fine-tuning on local datasets. The main results shown in Tabel 2 are conducted with 1-layer plug-ins (i.e. only classifier). We run all algorithms for 1000 communication rounds, with 1 local epoch per round.