Universal Gradient Descent Ascent Method for Nonconvex-Nonconcave Minimax Optimization

Authors: Taoli Zheng, Linglingzhi Zhu, Anthony Man-Cho So, Jose Blanchet, Jiajin Li

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate our idea and demonstrate the universality of DS-GDA, we provide a visual representation of the feasible symmetric parameter selections by relating the regularizer to Lipschitz constants and step sizes. This graphical illustration, depicted in Figure 1a, reveals that numerous parameter settings can be chosen to guarantee convergence, which showcases the flexibility of DS-GDA. An experiment for a nonconvex-nonconcave problem has also been done to test the efficiency of using symmetric parameters. ... In all cases, DS-GDA successfully escapes limit cycles and converges to the desired stationary point, while other methods either suffer from the recurrence behavior or diverge. Moreover, our algorithm exhibits robustness to parameter selection (see Section 4.2), offering practical flexibility.
Researcher Affiliation Academia Taoli Zheng CUHK tlzheng@se.cuhk.edu.hk Linglingzhi Zhu CUHK llzzhu@se.cuhk.edu.hk Anthony Man-Cho So CUHK manchoso@se.cuhk.edu.hk José Blanchet Stanford University jose.blanchet@stanford.edu Jiajin Li Stanford University jiajinli@stanford.edu
Pseudocode Yes Algorithm 1: Doubly Smoothed GDA (DS-GDA)
Open Source Code No The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes KŁ-Nonconcave Example The following example satisfies two-sided KŁ property with exponent 1 2, which is studied in [58]: min x X max y Y x2 + 3 sin(x)2 sin(y)2 4y2 10 sin(y)2, (2) where X = Y = {z : 1 z 1}. The only saddle point is u = [x ; y ] = [0; 0]. ... Nonconvex-Nonconcave Example The nonconvex-nonconcave example considered here is the Bilinearly-coupled Minimax example (3) discussed in [29]: min x X max y Y f(x) + Axy f(y), (3) where f(z) = (z + 1)(z 1)(z + 3)(z 3), A = 11, and X = Y = {z : 4 z 4}.
Dataset Splits No The paper defines mathematical examples for experiments but does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts). The sets X and Y define the domain, but not data splits in the conventional sense for reproducibility.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or cloud instance specifications).
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes To be specific, by setting r1 = r2 = 0.125, c = α = 0.04, β = µ = 0.8, DS-GDA directly converges to the saddle point, which validates the universal results in Theorem 3. ... The extrapolation parameters z and v are initialized as x and y, respectively. According to Lemma 8, our algorithm terminates when z v and y v are less than 10 6 or when the number of iterations exceeds 107. To ensure a fair comparison, we use the same initializations for DS-GDA and S-GDA in each test. The algorithm parameters are tuned so that both DS-GDA and S-GDA are optimal.