Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Local Steps Speed Up Local GD for Heterogeneous Distributed Logistic Regression
Authors: Michael Crawshaw, Blake Woodworth, Mingrui Liu
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally study the behavior of Local GD under different choices of learning rate and local steps. We use two datasets: (1) a synthetic dataset with M = 2 clients and n = 1 data point per client, (2) a heterogeneous dataset of MNIST images with binary labels. |
| Researcher Affiliation | Academia | Michael Crawshaw George Mason University EMAIL Blake Woodworth George Washington University EMAIL Mingrui Liu George Mason University EMAIL |
| Pseudocode | Yes | Algorithm 1 Local GD Algorithm 2 Two-Stage Local GD Algorithm 3 Local Gradient Flow |
| Open Source Code | No | The paper does not contain any explicit statement about providing source code or a link to a code repository for the methodology described in this paper. |
| Open Datasets | Yes | We use two datasets: (1) a synthetic dataset with M = 2 clients and n = 1 data point per client, (2) a heterogeneous dataset of MNIST images with binary labels. ... Following recent work on GD for logistic regression (Wu et al., 2024b;a), we also evaluate on a dataset of 1000 MNIST images. |
| Dataset Splits | Yes | We partition the data into M = 5 client datasets with n = 200 data points each. This partitioning is done according to the protocol used by Karimireddy et al. (2020), where s% of each local dataset is allocated uniformly at random from the 1000 images, and the remaining (1 s)% is allocated to each client in order from a subset of data that is sorted by label. ... For our dataset with 10 digits and M = 5 clients, we set s = 5%. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9) needed to replicate the experiment. |
| Experiment Setup | Yes | We set η according to the requirements of theoretical guarantees, i.e. η = 1/(KH) from Corollary 2, and η1 = 1/(KH), η2 = 1/H from Theorem 1. For the two-stage stepsize, we choose r0 (the number of rounds in the first stage) as a linear function of K, as required by Theorem 1. Accordingly, we set r0 = λK and tune λ to ensure that the loss remains stable when transitioning to the second stage. The final tuned values of λ are λ = 4 for the synthetic experiment and λ = 1/16 for the MNIST experiment. ... We train a Res Net-50 (He et al., 2016) for image classification on a distributed version of the CIFAR-10 dataset, using cross-entropy loss. For both algorithms, we train for R = 1500 communication rounds while varying the number of local steps K {1, 2, 4, 8, 16}. ... We tune the initial learning rate η with grid search over {0.003, 0.01, 0.03, 0.1, 0.3, 1.0} ... best choice was η = 0.03. We also applied learning rate decay by a factor of 0.5 after 750 rounds, and again after 1125 rounds. Lastly, we use a batch size of 128 for each local gradient update. |