From Local SGD to Local Fixed-Point Methods for Federated Learning

Authors: Grigory Malinovskiy, Dmitry Kovalev, Elnur Gasanov, Laurent Condat, Peter Richtarik

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform convergence analysis of both methods and conduct a number of experiments highlighting the benefits of our approach. We implemented all algorithms in Python using the package MPI4PY, in order to run the code on a truly parallel architecture. All methods were evaluated on a computer with an Intel(R) Xeon(R) Gold 6146 CPU at 3.20GHz, having 24 cores. The cores are connected to 2 sockets, with 12 cores for each of them. The results of Algorithms 1 and 2 are illustrated in Figures 1 and 3, respectively.
Researcher Affiliation Academia 1Moscow Institute of Physics and Technology, Dolgoprudny, Russia 2King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
Pseudocode Yes Algorithm 1 Local fixed-point method. Algorithm 2 Randomized fixed-point method.
Open Source Code No The paper states that algorithms were implemented in Python but does not provide a link or explicit statement for open-sourcing the code.
Open Datasets Yes Datasets We use the a9a and a4a datasets from the LIBSVM library and we set κ to be L n, where n is the size of the dataset and L is a Lipschitz constant of the first part of f, without regularization.
Dataset Splits No The paper mentions using a9a and a4a datasets but does not specify train, validation, or test splits, nor does it refer to standard splits with citations.
Hardware Specification Yes All methods were evaluated on a computer with an Intel(R) Xeon(R) Gold 6146 CPU at 3.20GHz, having 24 cores. The cores are connected to 2 sockets, with 12 cores for each of them.
Software Dependencies No We implemented all algorithms in Python using the package MPI4PY. No specific version numbers for Python or MPI4PY are provided.
Experiment Setup Yes Input: Initial estimate ˆx0 Rd, stepsize λ > 0, sequence of synchronization times 0 = t0 < t1 < . . . (Algorithm 1). Input: Initial estimate ˆx0 Rd, stepsize λ > 0, communication probability 0 < p 1 (Algorithm 2). We use 1/L as the stepsize, so that each Ti is firmly nonexpansive. We observe a very tight match between our theory and the numerical results. As can be seen, the larger the value of the parameters H and λ, the faster the convergence at the beginning, but the larger the radius of the neighborhood.