Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Unified Linear Speedup Analysis of Federated Averaging and Nesterov FedAvg

Authors: Zhaonan Qu, Kaixiang Lin, Zhaojian Li, Jiayu Zhou, Zhengyuan Zhou

JAIR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies of the algorithms in various settings have supported our theoretical results. In this section, we empirically examine the linear speedup convergence of Fed Avg and Nesterov accelerated Fed Avg in various settings, including strongly convex function, convex smooth function, and overparameterized objectives. Setup. Following the experimental setting in Stich (2019), we conduct experiments on both synthetic datasets and real-world dataset w8a (Platt, 1999) (d = 300, n = 49749).
Researcher Affiliation Academia Zhaonan Qu EMAIL Department of Economics and Department of Management Science and Engineering, Stanford University, Stanford, CA 94305, USA Kaixiang Lin EMAIL Department of Computer Science and Engineering, Michigan State University, Lansing, MI 48824, USA Zhaojian Li EMAIL Department of Mechanical Engineering, Michigan State University, Lansing, MI 48824, USA Jiayu Zhou EMAIL Department of Computer Science and Engineering, Michigan State University, Lansing, MI 48824, USA Zhengyuan Zhou EMAIL Department of Technology, Operations, and Statistics, Stern School of Business, New York University, New York, NY 10003, USA
Pseudocode Yes Algorithm 1 Fed Avg: Federated Averaging 1: Server input: initial model w0, initial step size α0, local steps E. 2: Client input: 3: for each round r = 0, 1, ..., R, where r = t / E do 4: Sample clients St {1, ..., N} 5: Broadcast w to all clients k ∈ St 6: for each client k ∈ St do 7: initialize local model wk t = w 8: for t = rE + 1, . . . , (r + 1)E do 9: wk t+1 = wk t − αtgt,k 10: end for 11: end for 12: Average the local models at server end: wt = P k∈St wk t . 13: end for
Open Source Code No The paper does not provide an explicit statement about releasing code or a link to a code repository for the methodology described.
Open Datasets Yes Following the experimental setting in Stich (2019), we conduct experiments on both synthetic datasets and real-world dataset w8a (Platt, 1999) (d = 300, n = 49749).
Dataset Splits No The paper mentions distributing data across devices (e.g., 'evenly distributed on 1, 2, 4, 8, 16, 32 devices' and 'uniformly sample 50% devices without replacement'). However, it does not specify how the overall dataset (e.g., w8a) is split into training, validation, and test sets for model evaluation, or details on individual device data splits.
Hardware Specification No The paper does not explicitly mention any specific hardware (e.g., GPU/CPU models, memory) used for conducting the experiments. It only refers to 'devices' in the context of federated learning clients, not the experimental computational environment.
Software Dependencies No The paper mentions 'Fedscale' as a benchmarking suite in a citation (Lai et al., 2022) but does not list any specific software dependencies with version numbers for its own implementation or experiments.
Experiment Setup Yes We initialize all runs with w0 = 0d and measure the number of iterations to reach the target accuracy ϵ. For each configuration (E, K), we extensively search the learning rate from min(η0, nc / (1+t)), where η0 {0.1, 0.12, 1, 32} according to different problems and c can take the values c = 2^i i ∈ Z. ... Strongly convex objective: ...regularization parameter is set to λ = 1/n 2e-5. ...ϵ = 0.005. ...initial learning rate η0 {1, 32} and c0 = 1/8. ... Convex smooth objective: ...ϵ = 0.02. ...initial learning rate η0 {1, 32} and c0 = 1/8. ... Linear Regression: ...ϵ = 0.02. ...initial learning rate η0 {0.1, 0.12} and c0 = 1/256. ... Partial participation: ...uniformly sample 50% devices without replacement. ... Nesterov accelerated Fed Avg: We set βt = 0.1 and search αt in the same way as ηt in Fed Avg.