reproducibilityindex.ai

Towards Model Agnostic Federated Learning Using Knowledge Distillation

Authors: Andrei Afonin, Sai Praneeth Karimireddy

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose a new theoretical framework, Federated Kernel ridge regression, which can capture both model heterogeneity as well as data heterogeneity. Our analysis shows that the degradation is largely due to a fundamental limitation of knowledge distillation under data heterogeneity. We further validate our framework by analyzing and designing new protocols based on KD. Their performance on real world experiments using neural networks, though still unsatisfactory, closely matches our theoretical predictions.
Researcher Affiliation	Academia	Andrei Afonin EPFL andrei.afonin@epfl.ch Sai Praneeth Karimireddy EPFL, UC Berkeley sp.karimireddy@berkeley.edu
Pseudocode	No	The algorithms are described using step-by-step text (e.g., 'a. Agent 1 trains their model...') and summarized in conceptual figures (e.g., 'Figure 1a. Alternating KD starting from agent 1.') but no formal pseudocode or algorithm block is present.
Open Source Code	No	The paper does not provide any explicit statement or link regarding the release of source code for the described methodology.
Open Datasets	Yes	The real world experiments are conducted using CNN and MLP networks on MNIST, MLP network and RF model on MNIST, and VGG161... and CNN models on CIFAR10 datasets.
Dataset Splits	No	Further, we split the training data randomly at proportion 0.7/0.3 in the same data setting. For the different data setting, we split the data by labels: agent 1 has 0 to 4 labeled data points, agent 2 has 5 to 9 . Then we take randomly from each agent some Alpha = 0.1 portion of data, combine it and randomly return data points to both agents from this combined dataset. No explicit mention of a separate validation split percentage or size was found.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only generally refers to 'real world deep learning models' and 'neural networks'.
Software Dependencies	No	In all real world experiments we use the Adam optimizer with a default regularization (weight decay) of 3 × 10−4, unless in the no regularization case when it is set to 0. The paper does not provide specific software dependencies with version numbers for libraries or frameworks used.
Experiment Setup	Yes	In all real world experiments we use the Adam optimizer with a default regularization (weight decay) of 3 × 10−4, unless in the no regularization case when it is set to 0. We split the data between 2 agents by giving a bigger part of data to agent 1 at all same data experiments. ... All other details are presented in the Appendix A. ... The real world experiments are conducted using CNN and MLP networks on MNIST... Further, we split the training data randomly at proportion 0.7/0.3 in the same data setting. For the different data setting, we split the data by labels: agent 1 has 0 to 4 labeled data points, agent 2 has 5 to 9 . Then we take randomly from each agent some Alpha = 0.1 portion of data, combine it and randomly return data points to both agents from this combined dataset.