On the Convergence of FedAvg on Non-IID Data
Authors: Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically verify our results through numerical experiments. |
| Researcher Affiliation | Academia | Xiang Li School of Mathematical Sciences Peking University Beijing, 100871, China smslixiang@pku.edu.cn Kaixuan Huang School of Mathematical Sciences Peking University Beijing, 100871, China hackyhuang@pku.edu.cn Wenhao Yang Center for Data Science Peking University Beijing, 100871, China yangwenhaosms@pku.edu.cn Shusen Wang Department of Computer Science Stevens Institute of Technology Hoboken, NJ 07030, USA shusen.wang@stevens.edu Zhihua Zhang School of Mathematical Sciences Peking University Beijing, 100871, China zhzhang@math.pku.edu.cn |
| Pseudocode | No | The paper describes the algorithm steps in a descriptive paragraph under 'Algorithm description' in Section 2, but does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing its code or a link to a code repository. |
| Open Datasets | Yes | We distribute MNIST dataset (Le Cun et al., 1998) among N = 100 workers in a non-iid fashion such that each device contains samples of only two digits. |
| Dataset Splits | No | The paper describes how data is distributed across devices and how the model is evaluated during training, but it does not specify explicit training/validation/test dataset splits for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | The regularization parameter is set to λ = 10 4. In each round, all selected devices run E steps of SGD in parallel. We decay the learning rate at the end of each round by the following scheme ηt = η0 1+t, where η0 is chosen from the set {1, 0.1, 0.01}. For unbalanced MNIST, we use batch size b = 64. The hyperparameters are the same for all schemes: E = 20, K = 10 and b = 64. |