A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning
Authors: Zhenyu Sun, Ermin Wei
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we numerically show that Fed GDA-GT outperforms Local SGDA. In this section, we numerically measure the performance of Fed GDA-GT compared to Local SGDA with full gradients on a personal laptop by solving (1). We consider first perform experiments on quadratic objective functions with x and y uncoupled. Then, we test our algorithm on the robust linear regression problem. |
| Researcher Affiliation | Academia | Zhenyu Sun Department of Electrical and Computer Engineering Northwestern University Evanston, IL 60208 zhenyusun2026@u.northwestern.edu Ermin Wei Department of Electrical and Computer Engineering Northwestern University Evanston, IL 60208 ermin.wei@northwestern.edu |
| Pseudocode | Yes | Algorithm 1 Local SGDA and Algorithm 2 Fed GDA-GT |
| Open Source Code | Yes | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | No | We generate Ai, bi as follows: For each agent, every entry of Ai, denoted by [Ai]kl, is generated by Gaussian distribution N(0, (0.5i) 2). To construct bi, we generate a random reference point θi Rd, where θi N(µi, Id d). Each element of µi is drawn from N(α, 1) with α N(0, 100). Then bi = Aiθi + ϵi with ϵi N(0, 0.25Ini ni). We set the dimension of model as d = 50 and number of samples as ni = 500 and train the models with m = 20 agents by Algorithm 1 and Algorithm 2, respectively. We generate local models and data as follows: the local model x i is generated by a multivariate normal distribution. |
| Dataset Splits | No | We set the dimension of model as d = 50 and number of samples as ni = 500 and train the models with m = 20 agents by Algorithm 1 and Algorithm 2, respectively. The paper describes data generation but does not provide specific train/validation/test splits. |
| Hardware Specification | No | In this section, we numerically measure the performance of Fed GDA-GT compared to Local SGDA with full gradients on a personal laptop by solving (1). |
| Software Dependencies | No | The paper does not provide specific software dependency versions (e.g., library or framework names with version numbers). |
| Experiment Setup | Yes | In order to compare them, the learning rate is 10 4 for both algorithms and we choose Local SGDA with K = 1, which is equivalent to a centralized GDA, as the baseline. Figure 1 shows the trajectories of Algorithms 1 and 2 under objective functions constructed by (13), respectively. Different numbers of local updates are selected (with K = 20 and K = 50). For each case, we choose the same constant η for both Local SGDA and Fed GDAGT. |