Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity
Authors: Qihao Zhou, Haishan Ye, Luo Luo
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove SVOGS can achieve the ε-duality gap within communication rounds of O(δD2/ε), communication complexity of O(n + nδD2/ε), and local gradient calls of O(n + ( nδ + L)D2/ε log(1/ε)), where n is the number of nodes, δ is the degree of the second-order similarity, L is the smoothness parameter, and D is the diameter of the constraint set. We can verify that all of above complexity (nearly) matches the corresponding lower bounds. For the specific µ-strongly-convex-µ-strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of O(δ/µ log(1/ε)), O((n+ nδ/µ) log(1/ε)), and O(n+( nδ+L)/µ) log(1/ε)) respectively, which are also nearly tight. Furthermore, we conduct the numerical experiments to show the empirical advantages of the proposed method. |
| Researcher Affiliation | Collaboration | Qihao Zhou School of Data Science, Fudan University zhouqh20@fudan.edu.cn Haishan Ye School of Management, Xi an Jiaotong University SGIT AI Lab, State Grid Corporation of China yehaishan@xjtu.edu.cn Luo Luo School of Data Science, Fudan University Shanghai Key Laboratory for Contemporary Applied Mathematics luoluo@fudan.edu.cn |
| Pseudocode | Yes | Algorithm 1 Stochastic Variance-Reduced Optimistic Gradient Sliding (SVOGS) |
| Open Source Code | Yes | We include the code in the supplemental materials. |
| Open Datasets | Yes | We test the algorithms on real-world datasets a9a (N = 32, 561, d = 123), w8a (N = 49, 749, d = 300) and covtype (N = 581, 012, d = 54) from LIBSVM repository [9] and set the nodes number be n = 500. |
| Dataset Splits | No | The paper states it uses real-world datasets, but does not provide specific train/validation/test dataset splits (percentages, counts, or explicit predefined split references) needed to reproduce the data partitioning. |
| Hardware Specification | Yes | We implement all of the methods by Python 3.9 with Num Py and run on a machine with AMD Ryzen(TM) 7 4800H 8 core with Radeon Graphics 2.90 GHz CPU with 16GB RAM. |
| Software Dependencies | Yes | We implement all of the methods by Python 3.9 with Num Py and run on a machine with AMD Ryzen(TM) 7 4800H 8 core with Radeon Graphics 2.90 GHz CPU with 16GB RAM. |
| Experiment Setup | Yes | We tune the step-size η of SVOGS from {0.01, 0.1, 1}. The probability p is tuned from {p0, 5p0, 10p0}, where p0 = 1/ min{ n+δ/µ}. The batch size b is determined from { b0/10 , b0/5 , b0 }, with b0 = 1/p0. We set the other parameters by following our theoretical analysis. We set the average weight as γ = 1 p. For the momentum parameter, we set α = 1 for convex-concave case and α = max{1 ηµ/(6(1 γ)), 1 pηµ/(2γ + ηµ)} for strongly-convex-strongly-concave case, where we estimate µ by max{λ, β} for problem (13). For the sub-problem solver, we set its step-size according to the smoothness parameter of sub-problem, i.e., 1/(L + 1/η). In addition, we estimate the smooth parameter L and the similarity parameter δ by following the strategy in Appendix C of Beznosikov et al. [5]. |