ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

Authors: Konstantin Mishchenko, Grigory Malinovsky, Sebastian Stich, Peter Richtarik

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To test the performance of algorithms and illustrate theoretical results, we use classical logistic regression problem. In our experiments, we have two settings: deterministic (Figure 1) and stochastic problems (Figure 2).
Researcher Affiliation Academia 1CNRS, ENS, Inria Sierra, Paris, France 2Computer Science, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia 3CISPA Helmholtz Center for Information Security, Saarbr ucken, Germany.
Pseudocode Yes Algorithm 1 Prox Skip ... Algorithm 2 Scaffnew: Application of Prox Skip to Federated Learning (i.e., to problem (6) (7)) ... Algorithm 3 Decentralized Scaffnew ... Algorithm 4 SProx Skip (Stochastic gradient version of Prox Skip) ... Algorithm 5 Split Skip
Open Source Code Yes Our code is available on Git Hub: https://github.com/alarcoelectro/Prox Skip-Public
Open Datasets Yes We use the w8a dataset from LIBSVM library (Chang & Lin, 2011).
Dataset Splits No The paper mentions using the "w8a dataset from LIBSVM library" but does not provide explicit details about how it was split into training, validation, or test sets (e.g., percentages, sample counts, or a specific split methodology).
Hardware Specification Yes All methods were evaluated on a workstation with an Intel(R) Xeon(R) Gold 6146 CPU at 3.20GHz with 24 cores.
Software Dependencies No We implemented all algorithms in Python using the package RAY (Moritz et al., 2018). While it names Python and the RAY package, it does not provide specific version numbers for either, which is required for reproducibility.
Experiment Setup Yes We set the regularization parameter λ = 10 4L, where L is the smoothness constant. ... The number of local steps is set to be κˆ, where κˆ = L/µˆ is the estimated condition number. ... Our theory predicted that the choice p = 1 κ is optimal, which is close to the experiments results. Finally Figure 2 (c), we compared Scaffnew in stochastic case with different number of clients M.