Stability and Generalization of Decentralized Stochastic Gradient Descent
Authors: Tao Sun, Dongsheng Li, Bao Wang9756-9764
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify our theoretical findings by using a variety of decentralized settings and benchmark machine learning models. |
| Researcher Affiliation | Academia | Tao Sun1, Dongsheng Li1 , and Bao Wang2 1College of Computer, National University of Defense Technology, Changsha, Hunan, China. 2Scientific Computing & Imaging Institute, University of Utah, USA. |
| Pseudocode | Yes | Algorithm 1 Decentralized Stochastic Gradient Descent (DSGD) Require: (αt > 0)t 0, initialization x0 for node i = 1, 2, . . . , m for t = 1, 2, . . . updates local parameter as (3) and (4) xt = 1 m Pm i=1 xt(i) end for end for |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We use the Body Fat dataset (Johnson 1996)... We use the benchmark ijcnn1 dataset(Rennie and Rifkin 2001)... Res Net-20 (He et al. 2016) for CIFAR10 classification (Krizhevsky 2009). |
| Dataset Splits | No | The paper mentions using subsets of data for experiments but does not provide specific training/validation/test dataset splits or cross-validation details for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers). |
| Experiment Setup | Yes | We set the number of nodes m to 10 and conduct two kinds of experiments... We compare the training loss and training accuracy of D-SGD on these two datasets... For the above six graphs, we record the absolute difference in the value of function Φ for a set of learning rate, namely, {0.001, 0.004, 0.016, 0.064}... and set λ = 10−4... 100 epochs are used in the nonconvex test. |