Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sharp Gaussian approximations for Decentralized Federated Learning
Authors: SOHAM BONNERJEE, Sayar Karmakar, Wei Biao Wu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we present two generalized Gaussian approximation results for local SGD and explore their implications. ... Extensive simulations are provided to support our theoretical results. ... Finally, in Section 4, we validate our theoretical findings with extensive numerical exercises. |
| Researcher Affiliation | Academia | Soham Bonnerjee EMAIL Sayar Karmakar EMAIL Wei Biao Wu EMAIL |
| Pseudocode | Yes | Algorithm 1 local SGD Input: Initializations Θ0 = (θ1 0, . . . , θK 0 ) Rd K; Connection matrix C; Synchronization parameter τ N; Loss functions fk( , ξk), ξk Pk, k [K], weights {wk}K k=1, number of iterations n, step-size schedules {ηt}n t=1. Let Eτ = {τ, 2τ, . . . , Lτ}, where L = n τ . For t = 1, . . . , n : Θt = (Θt 1 ηt Gt) Ct, Ct = C, t Eτ, IK, otherwise. (2.2) Output: Yn := K 1Θn1 = K 1 PK k=1 θk n. |
| Open Source Code | Yes | All codes are available in github. |
| Open Datasets | Yes | G.2.1 Experiments based on MNIST dataset As a further application of Algorithm 2, we work on a federated learning (FL) setup with K = 5 clients collaboratively training a linear classifier on MNIST data. |
| Dataset Splits | No | For the purpose of the numerical exercises in this section, we choose d = 2 and β0 = (2, 3) , and let Γ = γI with γ 0. In particular, γ = 0 corresponds to a fixed effect β0 from which each client generates their observations. For each K, we generate ΣK uniformly from the set {1, . . . , 5}, DK from the specification above, and keep them fixed throughout the corresponding experiments as n varies. ... For each triplet of of (N, K, γ), we simulate nsim = 500 parallel independent local SGD chains with step-sizes ηt = 0.7t 0.85, and observations from the FRand-eff model in order to empirically simulate Un. ... For a randomly selected set of K0 = 3 clients, a label-flipping attack ... is injected at t = 50, and Algorithm 2 is employed to detect this attack. |
| Hardware Specification | No | The experiments are lightweight and run quickly on a modern laptop. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers within its main text. |
| Experiment Setup | Yes | Figure 1 shows how dc varies with varying n, K, τ when the step-size is kept fixed at ηt = 0.3t 0.75. ... We run the local SGD algorithm with τ = 5, and ηt = 0.5t β, for β {0.85, 0.9, 0.95}. ... In this section, we fix N = 500, τ = 20, and let K {10, 25, 50}, and compare the quantiles of the maximum partial sums of local SGD Un, Aggr-GA U Aggr-GA n , Client-GA U Client-GA n and approximation by Brownian motion: U f-CLT n . Clearly, Aggr-GA seems to be performing the best, as suggested by Theorems 3.1 and 3.2. Furthermore, U f-CLT n consistently has the worst approximation. |