High-probability complexity bounds for stochastic non-convex minimax optimization
Authors: Yassine Laguel, Yasa Syed, Necdet Serhat Aybat, Mert Gurbuzbalaban
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also present numerical results on a nonconvex/PL problem with synthetic data and on distributionally robust optimization problems with real data, illustrating our theoretical findings. ... 4 Numerical Illustrations In this section, we illustrate the performance of sm-AGDA. We consider an NCPL problem with synthetic data, as well as a nonconvex DRO problem using real datasets. |
| Researcher Affiliation | Academia | Yassine Laguel Laboratoire Jean Alexandre Dieudonné Université Côte d Azur Nice, France yassine.laguel@univ-cotedazur.fr Yasa Syed Department of Statistics Rutgers University Piscataway, New Jersey, USA yasa.syed@rutgers.edu Necdet Serhat Aybat Department of Industrial Engineering Penn State University University Park, PA, USA nsa10@psu.edu Mert Gürbüzbalaban Rutgers Business School Rutgers University Piscataway, New Jersey, USA mg1366@rutgers.edu |
| Pseudocode | Yes | Algorithm 1 sm-AGDA |
| Open Source Code | Yes | For all datasets, the primal stepsize τ1 of sm-AGDA is tuned via a grid-search over {10 k, 1 k 4}. The dual stepsize τ2 is set as τ2 = τ1 /48. Similarly, β is estimated through a grid-search over {10 k, 3 k 5}. The parameter p is also tuned similarly on a grid, our code is provided as a supplementary document for the details. |
| Open Datasets | Yes | We consider three standard datasets for this problem, which are summarized as follows: The sido0 dataset [50] has d1 = 4932 and d2 = 12678. The gisette dataset [26] has d1 = 5000 and d2 = 6000. Finally, the a9a dataset [13] has d1 = 123 and d2 = 32561. |
| Dataset Splits | No | The paper mentions training phases ('early phase of the training', 'later phases') and epochs, and uses a grid-search for hyperparameter tuning. However, it does not specify explicit training/validation/test dataset splits, percentages, or absolute counts for reproducibility of data partitioning. |
| Hardware Specification | Yes | For synthetic experiments, we used an ASUS Laptop model Q540VJ with 13th Generation Intel Core i9-13900H using 16GB RAM and 1TB SSD hard drive. For the DRO experiments, we used a high-performance computing cluster with automatic GPU selection (NVIDIA RTX 3050, RTX 3090, A100, or Tesla P100) based on GPU availability, ensuring optimal use of computational resources. |
| Software Dependencies | No | The paper mentions that its code is provided as a supplementary document and uses Python implicitly, but it does not specify any particular software libraries or dependencies with their version numbers (e.g., specific versions of deep learning frameworks or numerical libraries) that are needed to replicate the experiments. |
| Experiment Setup | Yes | The parameters of the problem are explicitly available as µ = 2m2, and ℓ= max{12m1, 8m2, K }. To illustrate Theorem 11, we set β = τ2µ /1600, τ2 = τ1 /48, p = 2ℓ and we considered two cases: τ1 = 1 /3ℓ(long step) and τ1 = 1 /12ℓ(short step) to explore the behavior of sm-AGDA for different stepsizes. We generated N = 25 sample paths for T = 10, 000 iterations... |