Momentum Benefits Non-iid Federated Learning Simply and Provably
Authors: Ziheng Cheng, Xinmeng Huang, Pengfei Wu, Kun Yuan
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results support all theoretical findings. ... 5 EXPERIMENTS We present experiments on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009) with two neural networks (three-layer MLP, Res Net-18) to justify the efficacy of our proposed algorithms. We evaluate them along with baselines including FEDAVG (Koneˇcn y et al., 2016), SCAFFOLD (Karimireddy et al., 2020b), MB-STORM, CE-LSGD (Patel et al., 2022). |
| Researcher Affiliation | Academia | Ziheng Cheng Peking University alex-czh@stu.pku.edu.cn Xinmeng Huang University of Pennsylvania xinmengh@sas.upenn.edu Pengfei Wu Peking University pengfeiwu1999@stu.pku.edu.cn Kun Yuan Peking University kunyuan@pku.edu.cn |
| Pseudocode | Yes | Algorithm 1 FEDAVG-M: FEDAVG with momentum... Algorithm 2 SCAFFOLD-M: SCAFFOLD with momentum... Algorithm 3 FEDAVG-M-VR: FEDAVG with variance-reduced momentum... Algorithm 4 SCAFFOLD-M-VR: SCAFFOLD with variance-reduced momentum |
| Open Source Code | No | The paper does not contain any explicit statement about releasing open-source code or a link to a code repository. |
| Open Datasets | Yes | We present experiments on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009)... We further conduct experiments with N = 100 on the MNIST dataset... |
| Dataset Splits | No | The paper mentions generating non-iid data using the Dirichlet distribution and setting hyperparameters via grid search, but it does not provide specific percentages or counts for training, validation, and test splits. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used to run the experiments. It only mentions general setups like 'large-scale machine learning' or 'edge devices' in theoretical context. |
| Software Dependencies | No | The paper mentions 'current Pytorch implementation of momentum-based methods' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The MLP experiments involve K = 32 local updates and N = 10 clients... The weight decay is set as 10^-4. The global learning rate is fixed as γ = ηK for all the algorithms, and we perform a grid search for the local learning rate η in values {0.005, 0.01, 0.05, 0.1, 0.5}. Similarly, we search for the momentum parameter β in values {0.1, 0.2, 0.5, 0.8}. ... The experiment involves N = 10 clients and K = 16 local updates. We set S = 2 in partial client participation. The local learning is fixed as ˆη = 0.001 and global learning rate is ˆγ = ˆηK. The momentum parameter is β = 0.1 and batchsize is 128. |