Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

Authors: Qihao Zhou, Haishan Ye, Luo Luo

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove SVOGS can achieve the ε-duality gap within communication rounds of O(δD2/ε), communication complexity of O(n + nδD2/ε), and local gradient calls of O(n + ( nδ + L)D2/ε log(1/ε)), where n is the number of nodes, δ is the degree of the second-order similarity, L is the smoothness parameter, and D is the diameter of the constraint set. We can verify that all of above complexity (nearly) matches the corresponding lower bounds. For the specific µ-strongly-convex-µ-strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of O(δ/µ log(1/ε)), O((n+ nδ/µ) log(1/ε)), and O(n+( nδ+L)/µ) log(1/ε)) respectively, which are also nearly tight. Furthermore, we conduct the numerical experiments to show the empirical advantages of the proposed method.
Researcher Affiliation Collaboration Qihao Zhou School of Data Science, Fudan University zhouqh20@fudan.edu.cn Haishan Ye School of Management, Xi an Jiaotong University SGIT AI Lab, State Grid Corporation of China yehaishan@xjtu.edu.cn Luo Luo School of Data Science, Fudan University Shanghai Key Laboratory for Contemporary Applied Mathematics luoluo@fudan.edu.cn
Pseudocode Yes Algorithm 1 Stochastic Variance-Reduced Optimistic Gradient Sliding (SVOGS)
Open Source Code Yes We include the code in the supplemental materials.
Open Datasets Yes We test the algorithms on real-world datasets a9a (N = 32, 561, d = 123), w8a (N = 49, 749, d = 300) and covtype (N = 581, 012, d = 54) from LIBSVM repository [9] and set the nodes number be n = 500.
Dataset Splits No The paper states it uses real-world datasets, but does not provide specific train/validation/test dataset splits (percentages, counts, or explicit predefined split references) needed to reproduce the data partitioning.
Hardware Specification Yes We implement all of the methods by Python 3.9 with Num Py and run on a machine with AMD Ryzen(TM) 7 4800H 8 core with Radeon Graphics 2.90 GHz CPU with 16GB RAM.
Software Dependencies Yes We implement all of the methods by Python 3.9 with Num Py and run on a machine with AMD Ryzen(TM) 7 4800H 8 core with Radeon Graphics 2.90 GHz CPU with 16GB RAM.
Experiment Setup Yes We tune the step-size η of SVOGS from {0.01, 0.1, 1}. The probability p is tuned from {p0, 5p0, 10p0}, where p0 = 1/ min{ n+δ/µ}. The batch size b is determined from { b0/10 , b0/5 , b0 }, with b0 = 1/p0. We set the other parameters by following our theoretical analysis. We set the average weight as γ = 1 p. For the momentum parameter, we set α = 1 for convex-concave case and α = max{1 ηµ/(6(1 γ)), 1 pηµ/(2γ + ηµ)} for strongly-convex-strongly-concave case, where we estimate µ by max{λ, β} for problem (13). For the sub-problem solver, we set its step-size according to the smoothness parameter of sub-problem, i.e., 1/(L + 1/η). In addition, we estimate the smooth parameter L and the similarity parameter δ by following the strategy in Appendix C of Beznosikov et al. [5].