A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning

Authors: Zhenyu Sun, Ermin Wei

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we numerically show that Fed GDA-GT outperforms Local SGDA. In this section, we numerically measure the performance of Fed GDA-GT compared to Local SGDA with full gradients on a personal laptop by solving (1). We consider first perform experiments on quadratic objective functions with x and y uncoupled. Then, we test our algorithm on the robust linear regression problem.
Researcher Affiliation Academia Zhenyu Sun Department of Electrical and Computer Engineering Northwestern University Evanston, IL 60208 zhenyusun2026@u.northwestern.edu Ermin Wei Department of Electrical and Computer Engineering Northwestern University Evanston, IL 60208 ermin.wei@northwestern.edu
Pseudocode Yes Algorithm 1 Local SGDA and Algorithm 2 Fed GDA-GT
Open Source Code Yes 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets No We generate Ai, bi as follows: For each agent, every entry of Ai, denoted by [Ai]kl, is generated by Gaussian distribution N(0, (0.5i) 2). To construct bi, we generate a random reference point θi Rd, where θi N(µi, Id d). Each element of µi is drawn from N(α, 1) with α N(0, 100). Then bi = Aiθi + ϵi with ϵi N(0, 0.25Ini ni). We set the dimension of model as d = 50 and number of samples as ni = 500 and train the models with m = 20 agents by Algorithm 1 and Algorithm 2, respectively. We generate local models and data as follows: the local model x i is generated by a multivariate normal distribution.
Dataset Splits No We set the dimension of model as d = 50 and number of samples as ni = 500 and train the models with m = 20 agents by Algorithm 1 and Algorithm 2, respectively. The paper describes data generation but does not provide specific train/validation/test splits.
Hardware Specification No In this section, we numerically measure the performance of Fed GDA-GT compared to Local SGDA with full gradients on a personal laptop by solving (1).
Software Dependencies No The paper does not provide specific software dependency versions (e.g., library or framework names with version numbers).
Experiment Setup Yes In order to compare them, the learning rate is 10 4 for both algorithms and we choose Local SGDA with K = 1, which is equivalent to a centralized GDA, as the baseline. Figure 1 shows the trajectories of Algorithms 1 and 2 under objective functions constructed by (13), respectively. Different numbers of local updates are selected (with K = 20 and K = 50). For each case, we choose the same constant η for both Local SGDA and Fed GDAGT.