Handling Data Heterogeneity via Architectural Design for Federated Visual Recognition
Authors: Sara Pieri, Jose Restom, Samuel Horváth, Hisham Cholakkal
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through an in-depth analysis of diverse cutting-edge architectures such as convolutional neural networks, transformers, and MLP-mixers, we experimentally demonstrate that architectural choices can substantially enhance FL systems performance, particularly when handling heterogeneous data. We study 19 visual recognition models from five different architectural families on four challenging FL datasets. |
| Researcher Affiliation | Academia | Sara Pieri Jose Renato Restom Samuel Horvath Hisham Cholakkal Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) |
| Pseudocode | No | The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor are there any structured blocks of code or step-by-step procedures presented in a pseudocode format. |
| Open Source Code | Yes | Our source code is available at https://github.com/sarapieri/fed_het.git. |
| Open Datasets | Yes | We conduct experiments on four different datasets, CIFAR-10 [31], Celeb A [46], Fed ISIC2019 [68, 5, 7] and Google Landmarks Dataset v2 (GLD-23K) [72]. |
| Dataset Splits | Yes | The original test set is retained as the global test set, and 5,000 images are set aside from the training set for validation purposes, resulting in a revised training dataset of 45,000 images. We employ the Kolmogorov Smirnov statistic (KS) to simulate one Independent and Identically Distributed (IID) data partition, namely Split-1 (KS=0), ensuring balanced labels per client, and two non-IID partitions, Split-2 (KS=0.5) and Split-3 (KS=1), with label distribution skew. In Split-2, each client has access to four classes and does not receive samples from the remaining six classes. In Split-3, each client strictly sees samples from two classes only. |
| Hardware Specification | Yes | We simulate the federated learning setup (1 server and N devices) on a single machine with 32 Intel(R) Xeon(R) Silver 4215 CPU and 1 NVidia Quadro RTX 6000 GPU. |
| Software Dependencies | No | The paper mentions general tools and frameworks such as 'Flower' and 'LEAF benchmark' but does not provide specific version numbers for the software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or other libraries used for the experiments. |
| Experiment Setup | Yes | Our approach involves performing Fed AVG for a specific number of communication rounds to reach convergence: 100 rounds for both the CIFAR-10 and Fed-ISIC2019 datasets, 30 rounds for the Celeb A dataset, and 200 rounds for the GLD-23K dataset. With the exception of the GLD-23K dataset, where we conduct five local steps, each round of local training encompasses one local epoch. The training regimen employs the SGD optimizer with an initial learning rate of 0.03, cosine decay, and a warm-up phase of 100 steps. We set the local training batch size at 32 and apply gradient clipping with a unitary norm to stabilize the training process. |