Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Data Heterogeneity in Adaptive Federated Learning

Authors: Yujia Wang, Jinghui Chen

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We theoretically prove the fast convergence for our proposed method under non-convex stochastic settings and empirically demonstrate its superior performances over vanilla adaptive federated learning with client sampling. Moreover, we extend our framework to a communication-efficient variant, in which clients are divided into disjoint clusters determined by their connectivity or communication capabilities. We exclusively perform local gossip averaging within these clusters, leading to an enhancement in network communication efficiency for our proposed method.
Researcher Affiliation Academia Yujia Wang EMAIL College of Information Sciences and Technology Pennsylvania State University Jinghui Chen EMAIL College of Information Sciences and Technology Pennsylvania State University
Pseudocode Yes Algorithm 1 AFGA: Adaptive Federated Learning with Local Gossip Averaging Algorithm 2 CAFGA: Clustered-Client Adaptive Federated Learning with Local Gossip Averaging
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide a link to a code repository. The Open Review link provided is for paper review, not code.
Open Datasets Yes We conduct experiments on CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009) and Shakespeare (Caldas et al., 2018) dataset
Dataset Splits Yes We conduct experiments on CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009) and Shakespeare (Caldas et al., 2018) dataset with various data sampling levels and client participation settings. We evaluate experiments on non-i.i.d. data distributions by a Dirichlet distribution partitioned strategy with parameter α = 0.6 similar to Wang et al. (2020a;b).
Hardware Specification Yes All experiments in this paper are conducted on 4 NVIDIA RTX A6000 GPUs.
Software Dependencies No The paper mentions using SGD optimizer, Adam, and AMSGrad, but does not provide specific version numbers for any software libraries, programming languages, or frameworks used for implementation.
Experiment Setup Yes The number of local training iterations I on each client is set to 24 for experiments on CIFAR-10 and CIFAR-100 datasets, and I = 100 for experiments on the Shakespeare dataset, and the batch size is set to 50 for all experiments by default. For local update, we use the SGD optimizer with a learning rate from {0.1, 1} for SGD-based global optimization methods... and use SGD optimizer with a learning rate from {1,2,10} for adaptive global optimization methods. We set the global learning rate as 1 for SGD-based global update, and set the global learning rate as 0.01 for global adaptive optimization, Fed Adam, Fed AMSGrad, and our proposed AFGA. For the global AMSGrad optimizer, we set β1 = 0.9, β1 = 0.99, and we search the best ϵ from {10 10, 10 8, 10 6, 10 4}.