SILVER: Single-loop variance reduction and application to federated learning
Authors: Kazusato Oko, Shunta Akiyama, Denny Wu, Tomoya Murata, Taiji Suzuki
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. NUMERICAL EXPERIMENTS Finally, we verify our theories by numerical experiments. Detailed explanation and additional experiments can be found in Appendix B. In all figures, the red line corresponds to our proposed method. 5.1. Accuracy of the Gradient Estimator of SILVER We considered a classification of the capital letters using EMNIST (Cohen et al., 2017) with a two-layer neural network. ... Figure 2. Test accuracy (finite-sum) |
| Researcher Affiliation | Collaboration | 1The University of Tokyo 2Center for Advanced Intelligence Project, RIKEN 3New York University 4Flatiron Institute 5NTT DATA Mathematical Systems Inc. |
| Pseudocode | Yes | Algorithm 1 SILVER(x0, η, b, T, r) ... Algorithm 2 FL-SILVER(x0, η, p, b, T, K, r) |
| Open Source Code | No | The paper states in a checklist in the Appendix that anonymized source code is included, but there is no explicit statement in the main text or appendix about releasing the code or a link to a repository. |
| Open Datasets | Yes | We considered a classification of the capital letters using EMNIST (Cohen et al., 2017) with a two-layer neural network. |
| Dataset Splits | No | The paper describes using the EMNIST dataset and creating groups for experiments but does not explicitly detail the training, validation, and test dataset splits or percentages used. |
| Hardware Specification | Yes | OS: Ubuntu 16.04.5 CPU: Intel(R) Xeon(R) CPU E5-2680 v4 2.40GHz CPU Memory: 512GB GPU: Nvidia Tesla V100 (32GB) |
| Software Dependencies | Yes | Programming language: Python 3.6.13 Deep learning framework: Py Torch 1.7.1 |
| Experiment Setup | Yes | We set n = 130, b = 1/2*130, and the inner-loop length of SARAH to n/b = 10. We set the learning rate to η = 0.01 for all algorithms... (from B.1.1). Also We set the minibatch size to b = 12 for all algorithms, the inner-loop length of SARAH and SSRGD to m = n/b = 10, and λ = b/n * 0.092 for Zero SARAH. (from B.1.2). And we set the number of local update to K = 10 and the local minibatch size as b = 16. We tuned the learning rate for each algorithm individually from {1.0, 0.3, 0.1, 0.03, 0.01, 0.003, 0.001} (from B.1.3). |