Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Local Steps Speed Up Local GD for Heterogeneous Distributed Logistic Regression
Authors: Michael Crawshaw, Blake Woodworth, Mingrui Liu
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally study the behavior of Local GD under different choices of learning rate and local steps. We use two datasets: (1) a synthetic dataset with M = 2 clients and n = 1 data point per client, (2) a heterogeneous dataset of MNIST images with binary labels. |
| Researcher Affiliation | Academia | Michael Crawshaw George Mason University EMAIL Blake Woodworth George Washington University EMAIL Mingrui Liu George Mason University EMAIL |
| Pseudocode | Yes | Algorithm 1 Local GD Algorithm 2 Two-Stage Local GD Algorithm 3 Local Gradient Flow |
| Open Source Code | No | The paper does not contain any explicit statement about providing source code or a link to a code repository for the methodology described in this paper. |
| Open Datasets | Yes | We use two datasets: (1) a synthetic dataset with M = 2 clients and n = 1 data point per client, (2) a heterogeneous dataset of MNIST images with binary labels. ... Following recent work on GD for logistic regression (Wu et al., 2024b;a), we also evaluate on a dataset of 1000 MNIST images. |
| Dataset Splits | Yes | We partition the data into M = 5 client datasets with n = 200 data points each. This partitioning is done according to the protocol used by Karimireddy et al. (2020), where s% of each local dataset is allocated uniformly at random from the 1000 images, and the remaining (1 s)% is allocated to each client in order from a subset of data that is sorted by label. ... For our dataset with 10 digits and M = 5 clients, we set s = 5%. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9) needed to replicate the experiment. |
| Experiment Setup | Yes | We set η according to the requirements of theoretical guarantees, i.e. η = 1/(KH) from Corollary 2, and η1 = 1/(KH), η2 = 1/H from Theorem 1. For the two-stage stepsize, we choose r0 (the number of rounds in the first stage) as a linear function of K, as required by Theorem 1. Accordingly, we set r0 = λK and tune λ to ensure that the loss remains stable when transitioning to the second stage. The final tuned values of λ are λ = 4 for the synthetic experiment and λ = 1/16 for the MNIST experiment. ... We train a Res Net-50 (He et al., 2016) for image classification on a distributed version of the CIFAR-10 dataset, using cross-entropy loss. For both algorithms, we train for R = 1500 communication rounds while varying the number of local steps K {1, 2, 4, 8, 16}. ... We tune the initial learning rate η with grid search over {0.003, 0.01, 0.03, 0.1, 0.3, 1.0} ... best choice was η = 0.03. We also applied learning rate decay by a factor of 0.5 after 750 rounds, and again after 1125 rounds. Lastly, we use a batch size of 128 for each local gradient update. |