Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems

Authors: Luo Luo, Haishan Ye, Zhichao Huang, Tong Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct the experiments by using distributionally robust optimization with nonconvex regularized logistic loss [5, 14, 21, 46]. Given dataset {(ai, bi)}n i=1 where ai Rd is the feature of i-th sample and bi {1, 1} the corresponding label, the minimax formulation is: min x Rd max y Y f(x, y) 1 n Pn i=1 yili(x) V (y) + g(x) , li(x) = log(1 + exp( bia i x)), g is the nonconvex regularizer [5]: αx2 i 1 + αx2 i , 2λ1 ny 1 2 2 and Y = {y Rn : 0 yi 1, Pn i=1 yi = 1} is a simplex. Following Yan et al. [46], Kohler and Lucchi [21] s settings, we let λ1 = 1/n2, λ2 = 10 3 and α = 10 for experiments. We evaluate compared the performance of SREDA with baseline algorithms GDAmax, GDA, SGDA [25] and Minimax PPA [26] on six real-world data sets a9a , w8a , gisette , mushrooms , sido0 and rcv1 , whose details are listed in Table 2.
Researcher Affiliation Academia Luo Luo1 Haishan Ye2 Zhichao Huang1 Tong Zhang1 1Department of Mathematics, The Hong Kong University of Science and Technology 2Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen
Pseudocode Yes Algorithm 1 SGDmax; Algorithm 2 SGDA; Algorithm 3 SREDA; Algorithm 4 Concave Maximizer; Algorithm 5 SREDA (Finite-sum Case)
Open Source Code No The paper does not provide an explicit statement or link to its open-source code.
Open Datasets Yes We evaluate compared the performance of SREDA with baseline algorithms GDAmax, GDA, SGDA [25] and Minimax PPA [26] on six real-world data sets a9a , w8a , gisette , mushrooms , sido0 and rcv1 , whose details are listed in Table 2. The dataset sido0 comes from Causality Workbench2 and the others can be downloaded from LIBSVM repository3.
Dataset Splits No The paper lists the datasets used but does not provide specific train/validation/test splits, percentages, or cross-validation details for reproducibility.
Hardware Specification Yes Our experiments are conducted on a workstation with Intel Xeon Gold 5120 CPU and 256GB memory.
Software Dependencies Yes We use MATLAB 2018a to run the code and the operating system is Ubuntu 18.04.4 LTS.
Experiment Setup Yes The parameters of the algorithms are chosen as follows: The stepsizes of all algorithms are tuned from {10 3, 10 2, 10 1, 1} and we keep the stepsize ratio is {10, 102, 103}. For stochastic algorithms SGDA and SREDA, the mini-batch size is set with {10, 100, 200}. For SREDA, we use the finite-sum version (Algorithm 5 with the first case of Theorem 2) and let q = m = n/S2 heuristically. The initialization of SREDA is based on PSARAH with K0 = 5, b = 1 and m = 20. For Minimax PPA, we tune the proximal parameter from {1, 10, 100} and momentum parameter from {0.2, 0.5, 0.7}. Each inner loop of Minimax PPA has five times Maximin-AG2 which contains five AGD iterations.