High-probability complexity bounds for stochastic non-convex minimax optimization

Authors: Yassine Laguel, Yasa Syed, Necdet Serhat Aybat, Mert Gurbuzbalaban

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also present numerical results on a nonconvex/PL problem with synthetic data and on distributionally robust optimization problems with real data, illustrating our theoretical findings. ... 4 Numerical Illustrations In this section, we illustrate the performance of sm-AGDA. We consider an NCPL problem with synthetic data, as well as a nonconvex DRO problem using real datasets.
Researcher Affiliation Academia Yassine Laguel Laboratoire Jean Alexandre Dieudonné Université Côte d Azur Nice, France yassine.laguel@univ-cotedazur.fr Yasa Syed Department of Statistics Rutgers University Piscataway, New Jersey, USA yasa.syed@rutgers.edu Necdet Serhat Aybat Department of Industrial Engineering Penn State University University Park, PA, USA nsa10@psu.edu Mert Gürbüzbalaban Rutgers Business School Rutgers University Piscataway, New Jersey, USA mg1366@rutgers.edu
Pseudocode Yes Algorithm 1 sm-AGDA
Open Source Code Yes For all datasets, the primal stepsize τ1 of sm-AGDA is tuned via a grid-search over {10 k, 1 k 4}. The dual stepsize τ2 is set as τ2 = τ1 /48. Similarly, β is estimated through a grid-search over {10 k, 3 k 5}. The parameter p is also tuned similarly on a grid, our code is provided as a supplementary document for the details.
Open Datasets Yes We consider three standard datasets for this problem, which are summarized as follows: The sido0 dataset [50] has d1 = 4932 and d2 = 12678. The gisette dataset [26] has d1 = 5000 and d2 = 6000. Finally, the a9a dataset [13] has d1 = 123 and d2 = 32561.
Dataset Splits No The paper mentions training phases ('early phase of the training', 'later phases') and epochs, and uses a grid-search for hyperparameter tuning. However, it does not specify explicit training/validation/test dataset splits, percentages, or absolute counts for reproducibility of data partitioning.
Hardware Specification Yes For synthetic experiments, we used an ASUS Laptop model Q540VJ with 13th Generation Intel Core i9-13900H using 16GB RAM and 1TB SSD hard drive. For the DRO experiments, we used a high-performance computing cluster with automatic GPU selection (NVIDIA RTX 3050, RTX 3090, A100, or Tesla P100) based on GPU availability, ensuring optimal use of computational resources.
Software Dependencies No The paper mentions that its code is provided as a supplementary document and uses Python implicitly, but it does not specify any particular software libraries or dependencies with their version numbers (e.g., specific versions of deep learning frameworks or numerical libraries) that are needed to replicate the experiments.
Experiment Setup Yes The parameters of the problem are explicitly available as µ = 2m2, and ℓ= max{12m1, 8m2, K }. To illustrate Theorem 11, we set β = τ2µ /1600, τ2 = τ1 /48, p = 2ℓ and we considered two cases: τ1 = 1 /3ℓ(long step) and τ1 = 1 /12ℓ(short step) to explore the behavior of sm-AGDA for different stepsizes. We generated N = 25 sample paths for T = 10, 000 iterations...