Adaptive Federated Learning with Auto-Tuned Clients

Authors: Junhyung Lyle Kim, Taha Toghani, Cesar A Uribe, Anastasios Kyrillidis

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide theoretical and empirical results where the benefit of the client adaptivity is shown in various FL scenarios.
Researcher Affiliation Academia Department of Computer Science, Department of Electrical and Computer Engineering Rice University, Houston, TX 77005, USA
Pseudocode Yes We make a few remarks of Algorithm 1. First, the input θ0 > 0 can be quite arbitrary, as it can be corrected, per client level, in the first local iteration (line 10); similarly for η0 > 0, although η0 should be sufficiently small to prevent divergence in the first local step. Second, we include the amplifier γ to the first condition of step size (line 9), but this is only needed for Theorem 1. Last, fi(xi t,k 1) shows up twice: in updating xi t,k (line 8) and ηi t,k (line 9). Thus, one can use the same or different batches; we use the same batches in experiments to prevent additional gradient evaluations.
Open Source Code Yes Our implementation can be found in https://github.com/jlylekim/auto-tuned-FL.
Open Datasets Yes We use four datasets for image classification: MNIST, FMNIST, CIFAR-10, and CIFAR-100 (Krizhevsky et al., 2009). For text classification, we use two datasets: DBpedia and AGnews datasets (Zhang et al., 2015).
Dataset Splits No The paper describes how training data is partitioned among clients and mini-batch sizes used for training. However, it does not explicitly provide percentages or counts for training, validation, and test splits typically needed for reproduction, nor does it mention a dedicated validation set.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU specifications, or memory sizes used for running the experiments. It only mentions general terms like "computing power" or "computing device".
Software Dependencies No The paper mentions "Pytorch" in the context of using Adam's default settings, but it does not specify any version numbers for PyTorch or other critical software libraries used for the implementation, which is necessary for reproducible software dependencies.
Experiment Setup Yes For each optimizer, we perform a grid search of learning rates on a single task: CIFAR-10 classification trained with a Res Net-18, with Dirichlet concentration parameter α = 0.1; for the rest of the settings, we use the same learning rates. For SGD, we perform a grid search with η {0.01, 0.05, 0.1, 0.5}. For SGDM, we use the same grid for η and use momentum parameter β = 0.9. To properly account for the SGD(M) fine-tuning typically done in practice, we also test dividing the step size by 10 after 50%, and again by 10 after 75% of the total training rounds (LR decay). For Adam and Adagrad, we grid search with η {0.001, 0.01, 0.1}. For SPS, we use the default setting of the official implementation. For -SGD, we append δ in front of the second condition: q 1 + δθi t,k 1ηi t,k 1 following Malitsky & Mishchenko (2020), and use δ = 0.1 for all experiments.5 Finally, for the number of rounds T, we use 500 for MNIST, 1000 for FMNIST, 2000 for CIFAR-10 and CIFAR-100, and 100 for the text classification tasks.