Momentum Benefits Non-iid Federated Learning Simply and Provably

Authors: Ziheng Cheng, Xinmeng Huang, Pengfei Wu, Kun Yuan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results support all theoretical findings. ... 5 EXPERIMENTS We present experiments on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009) with two neural networks (three-layer MLP, Res Net-18) to justify the efficacy of our proposed algorithms. We evaluate them along with baselines including FEDAVG (Koneˇcn y et al., 2016), SCAFFOLD (Karimireddy et al., 2020b), MB-STORM, CE-LSGD (Patel et al., 2022).
Researcher Affiliation Academia Ziheng Cheng Peking University alex-czh@stu.pku.edu.cn Xinmeng Huang University of Pennsylvania xinmengh@sas.upenn.edu Pengfei Wu Peking University pengfeiwu1999@stu.pku.edu.cn Kun Yuan Peking University kunyuan@pku.edu.cn
Pseudocode Yes Algorithm 1 FEDAVG-M: FEDAVG with momentum... Algorithm 2 SCAFFOLD-M: SCAFFOLD with momentum... Algorithm 3 FEDAVG-M-VR: FEDAVG with variance-reduced momentum... Algorithm 4 SCAFFOLD-M-VR: SCAFFOLD with variance-reduced momentum
Open Source Code No The paper does not contain any explicit statement about releasing open-source code or a link to a code repository.
Open Datasets Yes We present experiments on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009)... We further conduct experiments with N = 100 on the MNIST dataset...
Dataset Splits No The paper mentions generating non-iid data using the Dirichlet distribution and setting hyperparameters via grid search, but it does not provide specific percentages or counts for training, validation, and test splits.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used to run the experiments. It only mentions general setups like 'large-scale machine learning' or 'edge devices' in theoretical context.
Software Dependencies No The paper mentions 'current Pytorch implementation of momentum-based methods' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes The MLP experiments involve K = 32 local updates and N = 10 clients... The weight decay is set as 10^-4. The global learning rate is fixed as γ = ηK for all the algorithms, and we perform a grid search for the local learning rate η in values {0.005, 0.01, 0.05, 0.1, 0.5}. Similarly, we search for the momentum parameter β in values {0.1, 0.2, 0.5, 0.8}. ... The experiment involves N = 10 clients and K = 16 local updates. We set S = 2 in partial client participation. The local learning is fixed as ˆη = 0.001 and global learning rate is ˆγ = ˆηK. The momentum parameter is β = 0.1 and batchsize is 128.