Data-Free Knowledge Distillation for Heterogeneous Federated Learning
Authors: Zhuangdi Zhu, Junyuan Hong, Jiayu Zhou
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, Michigan State University, Michigan, USA. |
| Pseudocode | Yes | Algorithm 1 FEDGEN |
| Open Source Code | Yes | Code is available at https://github.com/zhuangdizhu/FedGen |
| Open Datasets | Yes | Dataset: We conduct experiments on three image datasets: MNIST (Le Cun & Cortes, 2010), EMNIST (Cohen et al., 2017), and CELEBA (Liu et al., 2015), as suggested by the LEAF FL benchmark (Caldas et al., 2018). |
| Dataset Splits | No | The paper states, 'We use at most 50% of the total training dataset and distribute it to user models, and use all testing dataset for performance evaluation.' It mentions using Dirichlet distribution for non-iid data but does not explicitly provide percentages or counts for training, validation, or test splits. No explicit validation set is described. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments. It only describes the experimental setup in terms of communication rounds, user models, local steps, and batch size. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or CUDA versions) that would be needed to replicate the experiment. |
| Experiment Setup | Yes | Unless otherwise mentioned, we run 200 global communication rounds, with 20 user models in total and an active-user ratio r = 50%. We adopt a local updating step T = 20, and each step uses a mini batch with size B = 32. [...] For the classifier, we follow the network architecture of (Mc Mahan et al., 2017), and treat the last MLP layer as the predictor p k and all previous layers as the feature extractor f k. The generator Gw is MLP based. It takes a noise vector and an one-hot label vector y as the input, which, after a hidden layer with dimension dh, outputs a feature representation with dimension d. |