One-shot Federated Learning via Synthetic Distiller-Distillate Communication

Authors: Junyuan Zhang, Songhua Liu, Xinchao Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results demonstrate that Fed SD2C consistently outperforms other one-shot FL methods with more complex and real datasets, achieving up to 2.6 the performance of the best baseline.
Researcher Affiliation Academia Junyuan Zhang1,2 Songhua Liu1 Xinchao Wang1 National University of Singapore1 Beihang University2
Pseudocode Yes Algorithm 1 One-shot Federated Learning via Synthetic Distiller-Distillate Communication
Open Source Code Yes Code: https:// github.com/Carkham/Fed SD2C
Open Datasets Yes We conduct experiments on three real-world image datasets with different ranges of resolution including Tiny-Image Net [54], Image Nette [55], and Open Image [56].
Dataset Splits Yes To simulate data heterogeneity in real-world applications of one-shot FL, we use Dirichlet distribution to generate non-IID data to generate non-IID local data, as in [58] for Tiny-Image Net and Image Nette. Specifically, for client i, we sample pi k Dir(α) to allocate a pi k proportion of class k to client i. The parameter α controls the degree of data heterogeneity, with smaller α indicating severe data heterogeneity. The α is set to 0.1 by default unless otherwise stated. For Open Image, we randomly choose n real-world clients from Fed Scale [59] and use their corresponding test sets to form global sets.
Hardware Specification Yes All methods are implemented with Pytorch and conducted on Ge Force RTX 3090.
Software Dependencies Yes All methods are implemented with Pytorch and conducted on Ge Force RTX 3090.
Experiment Setup Yes We use the SGD optimizer with momentum=0.9, learning rate=0.01 and weight decay=0.0001 for clients local training. The batch size is set to 128 and local epoch is 200. For all generation based methods, we set the resolution of the generated images to 64 64, 128 128 and 256 256 for Tiny-Image Net, Image Nette and Open Image, the number of generated images in each batch is 128, and the learning rate of the generator is 0.001, the latent dimension is 256, iteration for training generator is 30, using Adam for optimization. The server model is optimized with SGD with momentum 0.9, the learning rate is 0.01, and the training epochs are 200. The synthesized batch size and server model training batch size is both 128. In DENSE, we set λ1 = 1 for BN loss and λ2 = 0.5 for diversity loss. In Co-Boosting, the perturbation strength is set to ϵ = 8/255 and the step size µ = 0.1/n. In Core-Set selection stage of Fed SD2C, for each image xi, we employ the torchvision.transform.Random Resize Crop K times to generate a collection of patches. For patch size, we set the scale=(0.08, 1.0), which is to collect diverse image patches. Following [42], we employ Conv Net-4 for Tiny-Image Net, Conv Net-5 for Image Nette and Conv Net-6 for Open Image.