ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!
Authors: Konstantin Mishchenko, Grigory Malinovsky, Sebastian Stich, Peter Richtarik
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To test the performance of algorithms and illustrate theoretical results, we use classical logistic regression problem. In our experiments, we have two settings: deterministic (Figure 1) and stochastic problems (Figure 2). |
| Researcher Affiliation | Academia | 1CNRS, ENS, Inria Sierra, Paris, France 2Computer Science, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia 3CISPA Helmholtz Center for Information Security, Saarbr ucken, Germany. |
| Pseudocode | Yes | Algorithm 1 Prox Skip ... Algorithm 2 Scaffnew: Application of Prox Skip to Federated Learning (i.e., to problem (6) (7)) ... Algorithm 3 Decentralized Scaffnew ... Algorithm 4 SProx Skip (Stochastic gradient version of Prox Skip) ... Algorithm 5 Split Skip |
| Open Source Code | Yes | Our code is available on Git Hub: https://github.com/alarcoelectro/Prox Skip-Public |
| Open Datasets | Yes | We use the w8a dataset from LIBSVM library (Chang & Lin, 2011). |
| Dataset Splits | No | The paper mentions using the "w8a dataset from LIBSVM library" but does not provide explicit details about how it was split into training, validation, or test sets (e.g., percentages, sample counts, or a specific split methodology). |
| Hardware Specification | Yes | All methods were evaluated on a workstation with an Intel(R) Xeon(R) Gold 6146 CPU at 3.20GHz with 24 cores. |
| Software Dependencies | No | We implemented all algorithms in Python using the package RAY (Moritz et al., 2018). While it names Python and the RAY package, it does not provide specific version numbers for either, which is required for reproducibility. |
| Experiment Setup | Yes | We set the regularization parameter λ = 10 4L, where L is the smoothness constant. ... The number of local steps is set to be κˆ, where κˆ = L/µˆ is the estimated condition number. ... Our theory predicted that the choice p = 1 κ is optimal, which is close to the experiments results. Finally Figure 2 (c), we compared Scaffnew in stochastic case with different number of clients M. |