reproducibilityindex.ai

Federated Learning from Small Datasets

Authors: Michael Kamp, Jonas Fischer, Jilles Vreeken

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that our simple, yet effective approach maintains privacy of local datasets, while it provably converges and guarantees improvement of model quality in convex problems with a suitable aggregation method. Formally, we show convergence for FEDDC on non-convex problems. We then show for convex problems that FEDDC succeeds on small datasets where standard federated learning fails. For that, we analyze FEDDC combined with aggregation via the Radon point from a PAC-learning perspective. We substantiate this theoretical analysis for convex problems by showing that FEDDC in practice matches the accuracy of a model trained on the full data of the SUSY binary classiﬁcation dataset with only 2 samples per client, outperforming standard federated learning by a wide margin. For non-convex settings, we provide an extensive empirical evaluation, showing that FEDDC outperforms naive daisy-chaining, vanilla federated learning FEDAVG (Mc Mahan et al., 2017), FEDPROX (Li et al., 2020a), FEDADAGRAD, FEDADAM, and FEDYOGI (Reddi et al., 2020) on low-sample CIFAR10 (Krizhevsky, 2009), including non-iid settings, and, more importantly, on two real-world medical imaging datasets.
Researcher Affiliation	Academia	Michael Kamp Institute for AI in medicine (IKIM) University Hospital Essen, Essen Germany, and Ruhr-University Bochum, Bochum Germany, and Monash University, Melbourne, Australia michael.kamp@uk-essen.de Jonas Fischer Harvard T.H. Chan School of Public Health Department of Biostatistics Boston, MA, United States jfischer@hsph.harvard.edu Jilles Vreeken CISPA Helmholtz Center for Information Security Saarbr ucken, Germany vreeken@cispa.de
Pseudocode	Yes	We provide the pseudocode of our approach as Algorithm 1.
Open Source Code	Yes	Details on the experimental setup are in App. A.1.1,A.1.2, code is publicly available at https://github.com/kampmichael/Fed DC.
Open Datasets	Yes	As datasets we consider a synthetic classiﬁcation dataset, image classiﬁcation in CIFAR10 (Krizhevsky, 2009), and two real medical datasets: MRI scans for brain tumors,2 and chest X-rays for pneumonia3. (2kaggle.com/navoneel/brain-mri-images-for-brain-tumor-detection; 3kaggle.com/praveengovi/coronahack-chest-xraydataset) ... We substantiate this theoretical analysis for convex problems by showing that FEDDC in practice matches the accuracy of a model trained on the full data of the SUSY binary classiﬁcation dataset (Baldi et al., 2014) with only 2 samples per client...
Dataset Splits	No	The paper specifies training and test sets but does not explicitly mention or detail a validation dataset split for its experiments. For example, Figure 2 shows 'train (green) and test error (orange)'. Table 1 reports 'average test accuracy'.
Hardware Specification	No	The paper mentions 'Due to hardware restrictions we are limited to training 150 Res Nets' in Footnote 4, but does not provide any specific details about the hardware used (e.g., CPU/GPU models, memory). No specific hardware specifications are mentioned for running the experiments.
Software Dependencies	No	The paper mentions software components like 'sklearn' (Pedregosa et al., 2011) implicitly and the use of 'Res Net', suggesting deep learning frameworks, but does not provide specific version numbers for any software dependencies. For example: 'synthetic binary classiﬁcation dataset generated by the sklearn (Pedregosa et al., 2011) make_classification function with 100 features.'
Experiment Setup	Yes	We train a small Res Net on 250 clients using FEDDC with d = 2 and b = 10, postponing the details on the experimental setup to App. A.1.1 and A.1.2. ... We compare FEDDC with daisy-chaining period d = 1 and aggregation period b = 200 to FEDAVG with the same amount of communication b = 1 and the same averaging period b = 200. ... We use a setting of 250 clients with a small version of Res Net, and 64 local samples each... We report the results in Figure 5 and set the period for FEDDC to b = 10, and consider federated averaging with periods of both b = 1 (equivalent communication to FEDDC with d = 1, b = 10) and b = 10 (less communication than FEDDC by a factor of 10) for all subsequent experiments. We use a daisy chaining period of d = 1 for FEDDC throughout all experiments for consistency...