Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
On the Data Heterogeneity in Adaptive Federated Learning
Authors: Yujia Wang, Jinghui Chen
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We theoretically prove the fast convergence for our proposed method under non-convex stochastic settings and empirically demonstrate its superior performances over vanilla adaptive federated learning with client sampling. Moreover, we extend our framework to a communication-efficient variant, in which clients are divided into disjoint clusters determined by their connectivity or communication capabilities. We exclusively perform local gossip averaging within these clusters, leading to an enhancement in network communication efficiency for our proposed method. |
| Researcher Affiliation | Academia | Yujia Wang EMAIL College of Information Sciences and Technology Pennsylvania State University Jinghui Chen EMAIL College of Information Sciences and Technology Pennsylvania State University |
| Pseudocode | Yes | Algorithm 1 AFGA: Adaptive Federated Learning with Local Gossip Averaging Algorithm 2 CAFGA: Clustered-Client Adaptive Federated Learning with Local Gossip Averaging |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide a link to a code repository. The Open Review link provided is for paper review, not code. |
| Open Datasets | Yes | We conduct experiments on CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009) and Shakespeare (Caldas et al., 2018) dataset |
| Dataset Splits | Yes | We conduct experiments on CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009) and Shakespeare (Caldas et al., 2018) dataset with various data sampling levels and client participation settings. We evaluate experiments on non-i.i.d. data distributions by a Dirichlet distribution partitioned strategy with parameter α = 0.6 similar to Wang et al. (2020a;b). |
| Hardware Specification | Yes | All experiments in this paper are conducted on 4 NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions using SGD optimizer, Adam, and AMSGrad, but does not provide specific version numbers for any software libraries, programming languages, or frameworks used for implementation. |
| Experiment Setup | Yes | The number of local training iterations I on each client is set to 24 for experiments on CIFAR-10 and CIFAR-100 datasets, and I = 100 for experiments on the Shakespeare dataset, and the batch size is set to 50 for all experiments by default. For local update, we use the SGD optimizer with a learning rate from {0.1, 1} for SGD-based global optimization methods... and use SGD optimizer with a learning rate from {1,2,10} for adaptive global optimization methods. We set the global learning rate as 1 for SGD-based global update, and set the global learning rate as 0.01 for global adaptive optimization, Fed Adam, Fed AMSGrad, and our proposed AFGA. For the global AMSGrad optimizer, we set β1 = 0.9, β1 = 0.99, and we search the best ϵ from {10 10, 10 8, 10 6, 10 4}. |