FedNAR: Federated Optimization with Normalized Annealing Regularization

Authors: Junbo Li, Ang Li, Chong Tian, Qirong Ho, Eric Xing, Hongyi Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide a comprehensive theoretical analysis of Fed NAR s convergence rate and conduct extensive experiments on both vision and language datasets with different backbone federated optimization algorithms. Our experimental results consistently demonstrate that incorporating Fed NAR into existing FL algorithms leads to accelerated convergence and heightened model accuracy.
Researcher Affiliation Collaboration Junbo Li1, Ang Li2, Chong Tian1, Qirong Ho1, Eric P. Xing1,3,4, Hongyi Wang3 1. Mohamed bin Zayed University of Artificial Intelligence 2. University of Maryland 3. Carnegie Mellon University 4. Petuum, Inc.
Pseudocode Yes Algorithm 1 Round t of Fed Avg; Algorithm 2 Fed NAR
Open Source Code Yes Our codes are released at https://github.com/ljb121002/fednar.
Open Datasets Yes We evaluate all the aforementioned algorithms on the CIFAR-10 dataset, partitioned among 100 clients under three different settings... The Shakespeare dataset [29], derived from The Complete Works of William Shakespeare, assigns each speaking role in every play to a unique client...
Dataset Splits No The paper describes how data is partitioned among clients for training and reports test accuracy, but it does not explicitly provide overall training/validation/test dataset splits with specific percentages or counts needed for direct reproduction of dataset partitioning.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper does not specify version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes For each algorithm, we conduct 1000 rounds of training for full convergence, with 20 steps of local training per round. In each round, we randomly sample 20 clients... We set the local learning rate to 0.01 with a decay of 0.998, and cap the maximum norm at 10... During local training, we utilize a batch size of 100, a learning rate of 0.1 with a decay rate of 0.998 per round, a dropout rate of 0.1, and a maximum norm of 10... Global updates are performed with a learning rate of 1.0.