SAPD+: An Accelerated Stochastic Method for Nonconvex-Concave Minimax Problems

Authors: Xuan Zhang, Necdet Serhat Aybat, Mert Gurbuzbalaban

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficiency of SAPD+ on a distributionally robust learning problem with a nonconvex regularizer and also on a multi-class classification problem in deep learning. ... 6 Numerical experiments
Researcher Affiliation Academia Xuan Zhang Department of Industrial and Manufacturing Engineering Pennsylvania State University University Park, PA,USA. xxz358@psu.edu Necdet Serhat Aybat Department of Industrial and Manufacturing Engineering Pennsylvania State University University Park, PA,USA. nsa10@psu.edu Mert Gürbüzbalaban Department of Management Science and Information Systems Rutgers University Piscataway, NJ, USA mg1366@rutgers.edu
Pseudocode Yes Algorithm 1 SAPD Algorithm ... Algorithm 2 SAPD+ Algorithm ... Algorithm 3 VR-SAPD Algorithm
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets Yes We perform experiments on three data sets: i) a9a with n = 32561, d = 123; ii) gisette with n = 6000, d = 5000; iii) sido0 with n = 12678, d = 4932. The dataset sido0 is obtained from Causality Workbench3 while the others can be downloaded from LIBSVM repository4.
Dataset Splits No The paper does not provide explicit details about training/validation/test splits, only mentioning training accuracy results and parameter tuning strategies.
Hardware Specification Yes The experiments are conducted on a PC with 3.6 GHz Intel Core i7 CPU and NVIDIA RTX2070 GPU.
Software Dependencies No The paper does not explicitly state specific software dependencies with version numbers.
Experiment Setup Yes Parameter tuning. We set the parameters according to [40, 26, 20], i.e., , α = 10, η1 = 10 3, η2 = 1/n2. ... We compare SAPD+ and SAPD+VR against PASGDA [4], SREDA [26], SMDA, SMDA-VR [17] algorithms. As suggested in [26], we tune the primal stepsizes of all the algorithms based on a grid-search over the set {10 3, 10 2, 10 1} and the ratio of the primal stepsize to dual stepsize, i.e., τ/σ, is varied to take values from the set {10, 102, 103, 104}. For all variance reduction-based algorithms, i.e., for SAPD+VR, SREDA, SMDA-VR, we tune the large batch size b |B| from the set {3000, 6000}, and the small batch size b |I| from grid search over the set {10, 100, 200}. For the frequency parameter q, we let q = b = |I| for SAPD+VR and SMDA-VR (as suggested in [17]); for SREDA, when we set q and m (SREDA s inner loop iteration number) to O(n/|I|) as suggested in [26], we noticed that SREDA does not perform well against SAPD+VR and SMDA-VR. Therefore, to optimize the performance of SREDA further, we tune q, m from a grid search over {10, 100, 200}. For methods without variance reduction, i.e., for SAPD+, SMDA and PASGDA, we also use mini-batch to estimate the gradients and tune the batch size from {10, 100, 200} as well. For SAPD+ and SAPD+VR, we tune the momentum θ from {0.8, 0.85, 0.9} and the inner iteration number from N = {10, 50, 100}.