DENSE: Data-Free One-Shot Federated Learning

Authors: Jie Zhang, Chen Chen, Bo Li, Lingjuan Lyu, Shuang Wu, Shouhong Ding, Chunhua Shen, Chao Wu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on a variety of real-world datasets demonstrate the superiority of our method. For example, DENSE outperforms the best baseline method Fed-ADI by 5.08% on CIFAR10 dataset.
Researcher Affiliation Collaboration 1Zhejiang University 2Youtu Lab, Tencent 3 Sony AI
Pseudocode Yes Algorithm 1 Training process of DENSE
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets Yes Our experiments are conducted on the following 6 real-world datasets: MNIST [24], FMNIST [53], SVHN [43], CIFAR10 [21], CIFAR100 [21], and Tiny-Image Net [23].
Dataset Splits Yes Tiny-Image Net contains 100000 images of 200 classes (500 for each class) downsized to 64x64 colored images. Each class has 500 training images, 50 validation images and 50 test images.
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU models, CPU specifications, or memory amounts. It only mentions general settings like 'on the server' or 'train the auxiliary generator G()' without hardware specifics.
Software Dependencies No The paper mentions software components like 'SGD optimizer' and 'Adam optimizer' but does not specify any version numbers for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For clients local training, we use the SGD optimizer with momentum=0.9 and learning rate=0.01. We set the batch size b = 128, the number of local epochs E = 200, and the client number m = 5. Following the setting of [2], we train the auxiliary generator G( ) with a deep convolutional network. We use Adam optimizer with learning rate ηG = 0.001. We set the number of training rounds in each epoch as TG = 30, and set the scaling factor λ1 = 1 and λ2 = 0.5. For the training of the server model f S(), we use the SGD optimizer with learning rate ηS = 0.01 and momentum=0.9. The number of epochs for distillation T = 200. All baseline methods use the same setting as ours.