Exploiting Label Skews in Federated Learning with Model Concatenation

Authors: Yiqun Diao, Qinbin Li, Bingsheng He

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that Fed Concat achieves significantly higher accuracy than previous state-of-the-art FL methods in various heterogeneous label skew distribution settings and meanwhile has lower communication costs. Our code is publicly available at https://github.com/sjtudyq/Fed Concat.
Researcher Affiliation Academia 1National University of Singapore 2University of California, Berkeley
Pseudocode Yes Algorithm 1: Fed Concat and Fed Concat-ID
Open Source Code Yes Our code is publicly available at https://github.com/sjtudyq/Fed Concat.
Open Datasets Yes Our experiments engage CIFAR-10 (Krizhevsky, Hinton et al. 2009), FMNIST (Xiao, Rasul, and Vollgraf 2017), SVHN (Netzer et al. 2011), CIFAR-100 (Krizhevsky, Hinton et al. 2009), and Tiny-Image Net datasets (Wu, Zhang, and Xu 2017) to evaluate our algorithm.
Dataset Splits No The paper mentions partitioning the dataset into clients and using a specific strategy for non-IID settings, but it does not explicitly state the train/validation/test dataset splits (e.g., percentages or counts) or reference a standard split that is widely known without requiring external lookup.
Hardware Specification No The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using SGD optimizer but does not provide specific software dependencies with version numbers (e.g., Python version, deep learning framework version like PyTorch or TensorFlow, or specific library versions).
Experiment Setup Yes The baseline settings replicate those from Li et al. (2021), running 50 rounds with each client training 10 local epochs per round, batch size 64, and learning rate 0.01 using SGD optimizer with weight decay 10 5. By default, our configuration includes a division of the 40 clients into K = 5 clusters, and 200 rounds allocated for training the classifier.