reproducibilityindex.ai

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Authors: Zhuangdi Zhu, Junyuan Hong, Jiayu Zhou

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical studies powered by theoretical implications show that, our approach facilitates FL with better generalization performance using fewer communication rounds, compared with the state-of-the-art.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, Michigan State University, Michigan, USA.
Pseudocode	Yes	Algorithm 1 FEDGEN
Open Source Code	Yes	Code is available at https://github.com/zhuangdizhu/FedGen
Open Datasets	Yes	Dataset: We conduct experiments on three image datasets: MNIST (Le Cun & Cortes, 2010), EMNIST (Cohen et al., 2017), and CELEBA (Liu et al., 2015), as suggested by the LEAF FL benchmark (Caldas et al., 2018).
Dataset Splits	No	The paper states, 'We use at most 50% of the total training dataset and distribute it to user models, and use all testing dataset for performance evaluation.' It mentions using Dirichlet distribution for non-iid data but does not explicitly provide percentages or counts for training, validation, or test splits. No explicit validation set is described.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments. It only describes the experimental setup in terms of communication rounds, user models, local steps, and batch size.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or CUDA versions) that would be needed to replicate the experiment.
Experiment Setup	Yes	Unless otherwise mentioned, we run 200 global communication rounds, with 20 user models in total and an active-user ratio r = 50%. We adopt a local updating step T = 20, and each step uses a mini batch with size B = 32. [...] For the classiﬁer, we follow the network architecture of (Mc Mahan et al., 2017), and treat the last MLP layer as the predictor p k and all previous layers as the feature extractor f k. The generator Gw is MLP based. It takes a noise vector and an one-hot label vector y as the input, which, after a hidden layer with dimension dh, outputs a feature representation with dimension d.