SAPD+: An Accelerated Stochastic Method for Nonconvex-Concave Minimax Problems
Authors: Xuan Zhang, Necdet Serhat Aybat, Mert Gurbuzbalaban
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficiency of SAPD+ on a distributionally robust learning problem with a nonconvex regularizer and also on a multi-class classification problem in deep learning. ... 6 Numerical experiments |
| Researcher Affiliation | Academia | Xuan Zhang Department of Industrial and Manufacturing Engineering Pennsylvania State University University Park, PA,USA. xxz358@psu.edu Necdet Serhat Aybat Department of Industrial and Manufacturing Engineering Pennsylvania State University University Park, PA,USA. nsa10@psu.edu Mert Gürbüzbalaban Department of Management Science and Information Systems Rutgers University Piscataway, NJ, USA mg1366@rutgers.edu |
| Pseudocode | Yes | Algorithm 1 SAPD Algorithm ... Algorithm 2 SAPD+ Algorithm ... Algorithm 3 VR-SAPD Algorithm |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | Yes | We perform experiments on three data sets: i) a9a with n = 32561, d = 123; ii) gisette with n = 6000, d = 5000; iii) sido0 with n = 12678, d = 4932. The dataset sido0 is obtained from Causality Workbench3 while the others can be downloaded from LIBSVM repository4. |
| Dataset Splits | No | The paper does not provide explicit details about training/validation/test splits, only mentioning training accuracy results and parameter tuning strategies. |
| Hardware Specification | Yes | The experiments are conducted on a PC with 3.6 GHz Intel Core i7 CPU and NVIDIA RTX2070 GPU. |
| Software Dependencies | No | The paper does not explicitly state specific software dependencies with version numbers. |
| Experiment Setup | Yes | Parameter tuning. We set the parameters according to [40, 26, 20], i.e., , α = 10, η1 = 10 3, η2 = 1/n2. ... We compare SAPD+ and SAPD+VR against PASGDA [4], SREDA [26], SMDA, SMDA-VR [17] algorithms. As suggested in [26], we tune the primal stepsizes of all the algorithms based on a grid-search over the set {10 3, 10 2, 10 1} and the ratio of the primal stepsize to dual stepsize, i.e., τ/σ, is varied to take values from the set {10, 102, 103, 104}. For all variance reduction-based algorithms, i.e., for SAPD+VR, SREDA, SMDA-VR, we tune the large batch size b |B| from the set {3000, 6000}, and the small batch size b |I| from grid search over the set {10, 100, 200}. For the frequency parameter q, we let q = b = |I| for SAPD+VR and SMDA-VR (as suggested in [17]); for SREDA, when we set q and m (SREDA s inner loop iteration number) to O(n/|I|) as suggested in [26], we noticed that SREDA does not perform well against SAPD+VR and SMDA-VR. Therefore, to optimize the performance of SREDA further, we tune q, m from a grid search over {10, 100, 200}. For methods without variance reduction, i.e., for SAPD+, SMDA and PASGDA, we also use mini-batch to estimate the gradients and tune the batch size from {10, 100, 200} as well. For SAPD+ and SAPD+VR, we tune the momentum θ from {0.8, 0.85, 0.9} and the inner iteration number from N = {10, 50, 100}. |