Universal Gradient Descent Ascent Method for Nonconvex-Nonconcave Minimax Optimization
Authors: Taoli Zheng, Linglingzhi Zhu, Anthony Man-Cho So, Jose Blanchet, Jiajin Li
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate our idea and demonstrate the universality of DS-GDA, we provide a visual representation of the feasible symmetric parameter selections by relating the regularizer to Lipschitz constants and step sizes. This graphical illustration, depicted in Figure 1a, reveals that numerous parameter settings can be chosen to guarantee convergence, which showcases the flexibility of DS-GDA. An experiment for a nonconvex-nonconcave problem has also been done to test the efficiency of using symmetric parameters. ... In all cases, DS-GDA successfully escapes limit cycles and converges to the desired stationary point, while other methods either suffer from the recurrence behavior or diverge. Moreover, our algorithm exhibits robustness to parameter selection (see Section 4.2), offering practical flexibility. |
| Researcher Affiliation | Academia | Taoli Zheng CUHK tlzheng@se.cuhk.edu.hk Linglingzhi Zhu CUHK llzzhu@se.cuhk.edu.hk Anthony Man-Cho So CUHK manchoso@se.cuhk.edu.hk José Blanchet Stanford University jose.blanchet@stanford.edu Jiajin Li Stanford University jiajinli@stanford.edu |
| Pseudocode | Yes | Algorithm 1: Doubly Smoothed GDA (DS-GDA) |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | KŁ-Nonconcave Example The following example satisfies two-sided KŁ property with exponent 1 2, which is studied in [58]: min x X max y Y x2 + 3 sin(x)2 sin(y)2 4y2 10 sin(y)2, (2) where X = Y = {z : 1 z 1}. The only saddle point is u = [x ; y ] = [0; 0]. ... Nonconvex-Nonconcave Example The nonconvex-nonconcave example considered here is the Bilinearly-coupled Minimax example (3) discussed in [29]: min x X max y Y f(x) + Axy f(y), (3) where f(z) = (z + 1)(z 1)(z + 3)(z 3), A = 11, and X = Y = {z : 4 z 4}. |
| Dataset Splits | No | The paper defines mathematical examples for experiments but does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts). The sets X and Y define the domain, but not data splits in the conventional sense for reproducibility. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or cloud instance specifications). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | To be specific, by setting r1 = r2 = 0.125, c = α = 0.04, β = µ = 0.8, DS-GDA directly converges to the saddle point, which validates the universal results in Theorem 3. ... The extrapolation parameters z and v are initialized as x and y, respectively. According to Lemma 8, our algorithm terminates when z v and y v are less than 10 6 or when the number of iterations exceeds 107. To ensure a fair comparison, we use the same initializations for DS-GDA and S-GDA in each test. The algorithm parameters are tuned so that both DS-GDA and S-GDA are optimal. |