reproducibilityindex.ai

Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge

Authors: Chaoyang He, Murali Annavaram, Salman Avestimehr

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train CNNs designed based on Res Net-56 and Res Net-110 using three distinct datasets (CIFAR-10, CIFAR-100, and CINIC-10) and their non-I.I.D. variants. Our results show that Fed GKT can obtain comparable or even slightly higher accuracy than Fed Avg. More importantly, Fed GKT makes edge training affordable. Compared to the edge training using Fed Avg, Fed GKT demands 9 to 17 times less computational power (FLOPs) on edge devices and requires 54 to 105 times fewer parameters in the edge CNN. Our source code is released at Fed ML (https://fedml.ai).
Researcher Affiliation	Academia	Chaoyang He Murali Annavaram Salman Avestimehr University of Southern California Los Angeles, CA 90007 chaoyang.he@usc.edu annavara@usc.edu avestime@usc.edu
Pseudocode	Yes	Algorithm 1 Group Knowledge Transfer.
Open Source Code	Yes	Our source code is released at Fed ML (https://fedml.ai).
Open Datasets	Yes	Our training task is image classiﬁcation on CIFAR-10 [24], CIFAR-100 [24], and CINIC-10 [25].
Dataset Splits	Yes	We also generate their non-I.I.D. variants by splitting training samples into K unbalanced partitions. Details of these three datasets are introduced in Appendix A.1.
Hardware Specification	Yes	Our server node has 4 NVIDIA RTX 2080Ti GPUs with sufﬁcient GPU memory for large model training. We use several CPU-based nodes as clients training small CNNs.
Software Dependencies	No	The paper mentions developing the framework based on Fed ML [23], but does not provide specific version numbers for Fed ML or other software dependencies like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	There are four important hyper-parameters in our Fed GKT framework: the communication round, as stated in line #2 of Algorithm 1, the edge-side epoch number, the server-side epoch number, and the server-side learning rate. After a tuning effort, we ﬁnd that the edge-side epoch number can simply be 1. The server epoch number depends on the data distribution. For I.I.D. data, the value is 20, and for non-I.I.D., the value depends on the level of data bias. For I.I.D., Adam optimizer [65] works better than SGD with momentum [64], while for non-I.I.D., SGD with momentum works better. During training, we reduce the learning rate once the accuracy has plateaued [68, 69]. We use the same data augmentation techniques for fair comparison (random crop, random horizontal ﬂip, and normalization). More details of hyper-parameters are described in Appendix B.4.