Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm

Authors: Batiste Le Bars, Aurélien Bellet, Marc Tommasi, Kevin Scaman, Giovanni Neglia

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Figure 1, we represent the generalization errors observed empirically for different communication graphs (see Appendix E.4 for experimental details).
Researcher Affiliation Academia 1Inria Paris Ecole Normale Sup erieure, PSL Research University 2Inria, Universit e de Montpellier 3Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189, CRISt AL, F-59000 Lille 4Inria, Universit e Cˆote d Azur.
Pseudocode Yes Algorithm 1 Decentralized SGD (Lian et al., 2017)
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository for the methodology described.
Open Datasets No We consider a logistic regression problem with two classes. Each data point (X, Y ) is i.i.d. and drawn as follows. With probability 0.5, the point is first associated to a class Y = 0 or 1. If Y = 0 then X follows a bivariate random Gaussian variable with vector mean (1, 1) and isotropic covariance I. If Y = 1, then the vector mean is ( 1, 1). To make the problem slightly more complicated and avoid separability, Y is then flipped with probability 0.1.
Dataset Splits No At each iteration t = 1, . . . , T, we compute a test loss (empirical population risk) using 500 i.i.d. data points, evaluated at all parameters θ(t) 1 , . . . , θ(t) m , and compute the difference with the associated training loss (full empirical risk).
Hardware Specification No The paper describes the experimental setup including dataset generation and training parameters, but does not mention specific hardware components like GPU or CPU models.
Software Dependencies No The paper describes the algorithms and problem setting but does not specify any software libraries or their version numbers.
Experiment Setup Yes For the training, we have m = 20 agents. To simulate the low noise regime, we take n = 1 local data point (i.e. full batch: σ2 = 0), while we take n = 10 local data points in the higher noise regime. We then run D-SGD (Variant B) for T = 500 iterations, with constant step size η = 0.03 and initial point θ(0) = 0. We consider four communication graphs: (i) Complete graph with uniform weights 1/m, (ii) Identity graph I (local SGD), (iii) Circle graph with self-edges and uniform weighs 1/3, and (iv) Complete graph with diagonal elements equal to 0.95 and remaining elements uniformly equal to 0.05/(m 1).