Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Momentum Benefits Non-iid Federated Learning Simply and Provably
Authors: Ziheng Cheng, Xinmeng Huang, Pengfei Wu, Kun Yuan
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results support all theoretical findings. ... 5 EXPERIMENTS We present experiments on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009) with two neural networks (three-layer MLP, Res Net-18) to justify the efficacy of our proposed algorithms. We evaluate them along with baselines including FEDAVG (Koneˇcn y et al., 2016), SCAFFOLD (Karimireddy et al., 2020b), MB-STORM, CE-LSGD (Patel et al., 2022). |
| Researcher Affiliation | Academia | Ziheng Cheng Peking University EMAIL Xinmeng Huang University of Pennsylvania EMAIL Pengfei Wu Peking University EMAIL Kun Yuan Peking University EMAIL |
| Pseudocode | Yes | Algorithm 1 FEDAVG-M: FEDAVG with momentum... Algorithm 2 SCAFFOLD-M: SCAFFOLD with momentum... Algorithm 3 FEDAVG-M-VR: FEDAVG with variance-reduced momentum... Algorithm 4 SCAFFOLD-M-VR: SCAFFOLD with variance-reduced momentum |
| Open Source Code | No | The paper does not contain any explicit statement about releasing open-source code or a link to a code repository. |
| Open Datasets | Yes | We present experiments on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009)... We further conduct experiments with N = 100 on the MNIST dataset... |
| Dataset Splits | No | The paper mentions generating non-iid data using the Dirichlet distribution and setting hyperparameters via grid search, but it does not provide specific percentages or counts for training, validation, and test splits. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used to run the experiments. It only mentions general setups like 'large-scale machine learning' or 'edge devices' in theoretical context. |
| Software Dependencies | No | The paper mentions 'current Pytorch implementation of momentum-based methods' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The MLP experiments involve K = 32 local updates and N = 10 clients... The weight decay is set as 10^-4. The global learning rate is fixed as γ = ηK for all the algorithms, and we perform a grid search for the local learning rate η in values {0.005, 0.01, 0.05, 0.1, 0.5}. Similarly, we search for the momentum parameter β in values {0.1, 0.2, 0.5, 0.8}. ... The experiment involves N = 10 clients and K = 16 local updates. We set S = 2 in partial client participation. The local learning is fixed as ˆη = 0.001 and global learning rate is ˆγ = ˆηK. The momentum parameter is β = 0.1 and batchsize is 128. |