Towards Model Agnostic Federated Learning Using Knowledge Distillation

Authors: Andrei Afonin, Sai Praneeth Karimireddy

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose a new theoretical framework, Federated Kernel ridge regression, which can capture both model heterogeneity as well as data heterogeneity. Our analysis shows that the degradation is largely due to a fundamental limitation of knowledge distillation under data heterogeneity. We further validate our framework by analyzing and designing new protocols based on KD. Their performance on real world experiments using neural networks, though still unsatisfactory, closely matches our theoretical predictions.
Researcher Affiliation Academia Andrei Afonin EPFL andrei.afonin@epfl.ch Sai Praneeth Karimireddy EPFL, UC Berkeley sp.karimireddy@berkeley.edu
Pseudocode No The algorithms are described using step-by-step text (e.g., 'a. Agent 1 trains their model...') and summarized in conceptual figures (e.g., 'Figure 1a. Alternating KD starting from agent 1.') but no formal pseudocode or algorithm block is present.
Open Source Code No The paper does not provide any explicit statement or link regarding the release of source code for the described methodology.
Open Datasets Yes The real world experiments are conducted using CNN and MLP networks on MNIST, MLP network and RF model on MNIST, and VGG161... and CNN models on CIFAR10 datasets.
Dataset Splits No Further, we split the training data randomly at proportion 0.7/0.3 in the same data setting. For the different data setting, we split the data by labels: agent 1 has 0 to 4 labeled data points, agent 2 has 5 to 9 . Then we take randomly from each agent some Alpha = 0.1 portion of data, combine it and randomly return data points to both agents from this combined dataset. No explicit mention of a separate validation split percentage or size was found.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only generally refers to 'real world deep learning models' and 'neural networks'.
Software Dependencies No In all real world experiments we use the Adam optimizer with a default regularization (weight decay) of 3 × 10−4, unless in the no regularization case when it is set to 0. The paper does not provide specific software dependencies with version numbers for libraries or frameworks used.
Experiment Setup Yes In all real world experiments we use the Adam optimizer with a default regularization (weight decay) of 3 × 10−4, unless in the no regularization case when it is set to 0. We split the data between 2 agents by giving a bigger part of data to agent 1 at all same data experiments. ... All other details are presented in the Appendix A. ... The real world experiments are conducted using CNN and MLP networks on MNIST... Further, we split the training data randomly at proportion 0.7/0.3 in the same data setting. For the different data setting, we split the data by labels: agent 1 has 0 to 4 labeled data points, agent 2 has 5 to 9 . Then we take randomly from each agent some Alpha = 0.1 portion of data, combine it and randomly return data points to both agents from this combined dataset.