FedNAR: Federated Optimization with Normalized Annealing Regularization
Authors: Junbo Li, Ang Li, Chong Tian, Qirong Ho, Eric Xing, Hongyi Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide a comprehensive theoretical analysis of Fed NAR s convergence rate and conduct extensive experiments on both vision and language datasets with different backbone federated optimization algorithms. Our experimental results consistently demonstrate that incorporating Fed NAR into existing FL algorithms leads to accelerated convergence and heightened model accuracy. |
| Researcher Affiliation | Collaboration | Junbo Li1, Ang Li2, Chong Tian1, Qirong Ho1, Eric P. Xing1,3,4, Hongyi Wang3 1. Mohamed bin Zayed University of Artiļ¬cial Intelligence 2. University of Maryland 3. Carnegie Mellon University 4. Petuum, Inc. |
| Pseudocode | Yes | Algorithm 1 Round t of Fed Avg; Algorithm 2 Fed NAR |
| Open Source Code | Yes | Our codes are released at https://github.com/ljb121002/fednar. |
| Open Datasets | Yes | We evaluate all the aforementioned algorithms on the CIFAR-10 dataset, partitioned among 100 clients under three different settings... The Shakespeare dataset [29], derived from The Complete Works of William Shakespeare, assigns each speaking role in every play to a unique client... |
| Dataset Splits | No | The paper describes how data is partitioned among clients for training and reports test accuracy, but it does not explicitly provide overall training/validation/test dataset splits with specific percentages or counts needed for direct reproduction of dataset partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper does not specify version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | For each algorithm, we conduct 1000 rounds of training for full convergence, with 20 steps of local training per round. In each round, we randomly sample 20 clients... We set the local learning rate to 0.01 with a decay of 0.998, and cap the maximum norm at 10... During local training, we utilize a batch size of 100, a learning rate of 0.1 with a decay rate of 0.998 per round, a dropout rate of 0.1, and a maximum norm of 10... Global updates are performed with a learning rate of 1.0. |