Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Authors: Zhuangdi Zhu, Junyuan Hong, Jiayu Zhou

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, Michigan State University, Michigan, USA.
Pseudocode Yes Algorithm 1 FEDGEN
Open Source Code Yes Code is available at https://github.com/zhuangdizhu/FedGen
Open Datasets Yes Dataset: We conduct experiments on three image datasets: MNIST (Le Cun & Cortes, 2010), EMNIST (Cohen et al., 2017), and CELEBA (Liu et al., 2015), as suggested by the LEAF FL benchmark (Caldas et al., 2018).
Dataset Splits No The paper states, 'We use at most 50% of the total training dataset and distribute it to user models, and use all testing dataset for performance evaluation.' It mentions using Dirichlet distribution for non-iid data but does not explicitly provide percentages or counts for training, validation, or test splits. No explicit validation set is described.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments. It only describes the experimental setup in terms of communication rounds, user models, local steps, and batch size.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or CUDA versions) that would be needed to replicate the experiment.
Experiment Setup Yes Unless otherwise mentioned, we run 200 global communication rounds, with 20 user models in total and an active-user ratio r = 50%. We adopt a local updating step T = 20, and each step uses a mini batch with size B = 32. [...] For the classifier, we follow the network architecture of (Mc Mahan et al., 2017), and treat the last MLP layer as the predictor p k and all previous layers as the feature extractor f k. The generator Gw is MLP based. It takes a noise vector and an one-hot label vector y as the input, which, after a hidden layer with dimension dh, outputs a feature representation with dimension d.