Enhancing One-Shot Federated Learning Through Data and Ensemble Co-Boosting
Authors: Rong Dai, Yonggang Zhang, Ang Li, Tongliang Liu, Xun Yang, Bo Han
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Co-Boosting can substantially outperform existing baselines under various settings. Moreover, Co-Boosting eliminates the need for adjustments to the client s local training, requires no additional data or model transmission, and allows client models to have heterogeneous architectures. |
| Researcher Affiliation | Collaboration | Rong Dai1,2, Yonggang Zhang2, Ang Li3, Tongliang Liu4, Xun Yang1, , Bo Han2 1University of Science and Technology of China, 2TMLR Group, Hong Kong Baptist University 3ECE Department, University of Maryland College Park, 4Sydney AI Centre, The University of Sydney |
| Pseudocode | Yes | Algorithm 1 Co-Boosting |
| Open Source Code | Yes | Code is available at https://github.com/rong-dai/Co-Boosting |
| Open Datasets | Yes | We conduct experiments on five real-world image datasets that are standard in the FL literature: MNIST (Le Cun et al., 1998), FMNIST (Xiao et al., 2017), SVHN (Netzer et al., 2011), CIFAR10, and CIFAR100 (Krizhevsky et al., 2009). |
| Dataset Splits | Yes | For the Fed DF method, we use 20% of the training set as a validation set for distillation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper mentions software like CNN, LeNet-5, PyTorch tutorial (Paszke et al., 2019), SGD optimizer, and Adam optimizer but does not provide specific version numbers for these software dependencies or libraries, which are necessary for full reproducibility. |
| Experiment Setup | Yes | For each client s local training, we use the SGD optimizer with momentum=0.9 and learning rate=0.01. We set the batch size to 128 and the local epoch to 300. The generator we use is the same as in Zhang et al. (2022a); Chen et al. (2019) and it is trained by Adam optimizer with a learning rate ηg = 1e3 over TG = 30 rounds. The distillation temperature in the knowledge distillation stage used in the server model stage is set to 4, while the temperature used in the KL loss in the generator loss is set to 1. The perturbation strength is set to ϵ = 8/255 and the step size µ is set to 0.1/n. For the training of the server model f S( ), we use the SGD optimizer with learning rate ηS = 0.01 and momentum=0.9. The number of total epochs T is set to 500. |