Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Local Steps Speed Up Local GD for Heterogeneous Distributed Logistic Regression

Authors: Michael Crawshaw, Blake Woodworth, Mingrui Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally study the behavior of Local GD under different choices of learning rate and local steps. We use two datasets: (1) a synthetic dataset with M = 2 clients and n = 1 data point per client, (2) a heterogeneous dataset of MNIST images with binary labels.
Researcher Affiliation Academia Michael Crawshaw George Mason University EMAIL Blake Woodworth George Washington University EMAIL Mingrui Liu George Mason University EMAIL
Pseudocode Yes Algorithm 1 Local GD Algorithm 2 Two-Stage Local GD Algorithm 3 Local Gradient Flow
Open Source Code No The paper does not contain any explicit statement about providing source code or a link to a code repository for the methodology described in this paper.
Open Datasets Yes We use two datasets: (1) a synthetic dataset with M = 2 clients and n = 1 data point per client, (2) a heterogeneous dataset of MNIST images with binary labels. ... Following recent work on GD for logistic regression (Wu et al., 2024b;a), we also evaluate on a dataset of 1000 MNIST images.
Dataset Splits Yes We partition the data into M = 5 client datasets with n = 200 data points each. This partitioning is done according to the protocol used by Karimireddy et al. (2020), where s% of each local dataset is allocated uniformly at random from the 1000 images, and the remaining (1 s)% is allocated to each client in order from a subset of data that is sorted by label. ... For our dataset with 10 digits and M = 5 clients, we set s = 5%.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9) needed to replicate the experiment.
Experiment Setup Yes We set η according to the requirements of theoretical guarantees, i.e. η = 1/(KH) from Corollary 2, and η1 = 1/(KH), η2 = 1/H from Theorem 1. For the two-stage stepsize, we choose r0 (the number of rounds in the first stage) as a linear function of K, as required by Theorem 1. Accordingly, we set r0 = λK and tune λ to ensure that the loss remains stable when transitioning to the second stage. The final tuned values of λ are λ = 4 for the synthetic experiment and λ = 1/16 for the MNIST experiment. ... We train a Res Net-50 (He et al., 2016) for image classification on a distributed version of the CIFAR-10 dataset, using cross-entropy loss. For both algorithms, we train for R = 1500 communication rounds while varying the number of local steps K {1, 2, 4, 8, 16}. ... We tune the initial learning rate η with grid search over {0.003, 0.01, 0.03, 0.1, 0.3, 1.0} ... best choice was η = 0.03. We also applied learning rate decay by a factor of 0.5 after 750 rounds, and again after 1125 rounds. Lastly, we use a batch size of 128 for each local gradient update.