Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD

Authors: Kun Yuan, Sulaiman A. Alghunaim, Xinmeng Huang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical simulations are conducted to validate our theories. Keywords: Decentralized optimization, stochastic optimization, transient stage... 7. Numerical Simulation In this section, we validate the established theoretical results with numerical simulations. 7.1 Strongly-Convex Scenario Problem. We consider the following decentralized least-square problem... Simulation settings. In our simulations, we set d = 10 and M = 1000. To control the data heterogeneity across the nodes, we first let each node i be associated with a local solution x i , and such x i is generated by x i = x + vi where x N(0, Id) is a randomly generated vector while vi N(0, σ2 h Id) controls the similarity between each local solution. Generally speaking, a large σ2 h results in local solutions {x i } that are vastly different from each other. With x i at hand, we can generate local data that follows distinct distributions. At node i, we generate each element in Ai following standard normal distribution. Measurement bi is generated by bi = Aix i + si where si N(0, σ2 s I) is some white noise.
Researcher Affiliation Academia Kun Yuan EMAIL Center for Machine Learning Research, Peking University AI for Science Institute Beijing 100871, P. R. China; Sulaiman A. Alghunaim EMAIL Department of Electrical Engineering Kuwait University Safat 13060, Kuwait; Xinmeng Huang EMAIL Graduate Group in Applied Mathematics and Computational Science University of Pennsylvania Philadelphia, PA 19104, USA
Pseudocode Yes Algorithm 1: D2/Exact-Diffusion... Algorithm 2: D2/Exact-Diffusion with multiple gossip steps... Algorithm 3: xi = Fast Gossip Average
Open Source Code No The paper does not provide an explicit statement or link to the source code for the methodology described in this specific paper. While it mentions a related project 'Blue Fog' in the related works section, it does not confirm that the code for *this* paper's work is available there.
Open Datasets Yes 7.3 Simulation with Real Datasets This subsection examines the performances of P-SGD, D-SGD, D2/ED, and MG-D2/ED with real datasets. We run experiments for the regularized logistic regression problem with... We consider two real datasets: MNIST (Deng, 2012) and COVTYPE.binary (Rossi and Ahmed, 2015).
Dataset Splits No The paper describes how the datasets were used for training and distributed among nodes to create heterogeneity (e.g., 'In COVTYPE.binary, we use 50,000 samples as training data...', 'half of the nodes maintain 54% positive samples...'), but it does not specify explicit train/test/validation splits for evaluating model generalization performance.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the simulations. It only mentions simulation settings like 'd = 10 and M = 1000' which are problem parameters.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or solvers).
Experiment Setup Yes In our simulations, we set d = 10 and M = 1000... To control the data heterogeneity across the nodes, we first let each node i be associated with a local solution x i , and such x i is generated by x i = x + vi where x N(0, Id) is a randomly generated vector while vi N(0, σ2 h Id) controls the similarity between each local solution... At each iteration k, each node will randomly sample a row in Ai and the corresponding element in bi and use them to evaluate the stochastic gradient. The metric for all simulations in this subsection is 1 n Pn i=1 x(k) i x 2... The left plot in Fig. 1 lists the performances of all algorithms. Each algorithm utilizes the same learning rate which decays by half for every 2,000 gossip communications... To this end, we let σ2 h = 0.2... we let σ2 h = 0... The regularization coefficient ρ = 0.001 for all simulations.