Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm
Authors: Batiste Le Bars, Aurélien Bellet, Marc Tommasi, Kevin Scaman, Giovanni Neglia
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Figure 1, we represent the generalization errors observed empirically for different communication graphs (see Appendix E.4 for experimental details). |
| Researcher Affiliation | Academia | 1Inria Paris Ecole Normale Sup erieure, PSL Research University 2Inria, Universit e de Montpellier 3Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189, CRISt AL, F-59000 Lille 4Inria, Universit e Cˆote d Azur. |
| Pseudocode | Yes | Algorithm 1 Decentralized SGD (Lian et al., 2017) |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository for the methodology described. |
| Open Datasets | No | We consider a logistic regression problem with two classes. Each data point (X, Y ) is i.i.d. and drawn as follows. With probability 0.5, the point is first associated to a class Y = 0 or 1. If Y = 0 then X follows a bivariate random Gaussian variable with vector mean (1, 1) and isotropic covariance I. If Y = 1, then the vector mean is ( 1, 1). To make the problem slightly more complicated and avoid separability, Y is then flipped with probability 0.1. |
| Dataset Splits | No | At each iteration t = 1, . . . , T, we compute a test loss (empirical population risk) using 500 i.i.d. data points, evaluated at all parameters θ(t) 1 , . . . , θ(t) m , and compute the difference with the associated training loss (full empirical risk). |
| Hardware Specification | No | The paper describes the experimental setup including dataset generation and training parameters, but does not mention specific hardware components like GPU or CPU models. |
| Software Dependencies | No | The paper describes the algorithms and problem setting but does not specify any software libraries or their version numbers. |
| Experiment Setup | Yes | For the training, we have m = 20 agents. To simulate the low noise regime, we take n = 1 local data point (i.e. full batch: σ2 = 0), while we take n = 10 local data points in the higher noise regime. We then run D-SGD (Variant B) for T = 500 iterations, with constant step size η = 0.03 and initial point θ(0) = 0. We consider four communication graphs: (i) Complete graph with uniform weights 1/m, (ii) Identity graph I (local SGD), (iii) Circle graph with self-edges and uniform weighs 1/3, and (iv) Complete graph with diagonal elements equal to 0.95 and remaining elements uniformly equal to 0.05/(m 1). |