On Convergence of Gradient Descent Ascent: A Tight Local Analysis

Authors: Haochuan Li, Farzan Farnia, Subhro Das, Ali Jadbabaie

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct several numerical experiments to support our theoretical findings. We conduct experiments on quadratic NC-SC functions to illustrate the convergence behaviors under different stepsize ratios. Finally, we conduct experiments on GANs to show that simultaneous GDA with ηx ηy enjoys fast convergence to a desired solution.
Researcher Affiliation Collaboration 1Department of EECS, Massachusetts Institute of Technology 2Department of CSE, The Chinese University of Hong Kong 3MITIBM Watson AI Lab, IBM Research 4Department of CEE, Massachusetts Institute of Technology.
Pseudocode No The paper describes algorithms using mathematical equations (e.g., equations 4 and 5) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not contain any statement about releasing source code or a direct link to a code repository.
Open Datasets Yes For both MNIST and CIFAR10, we train WGAN-GP models (Gulrajani et al., 2017) using simultaneous GDA with ηx = ηy = 0.001.
Dataset Splits No The paper mentions training on MNIST and CIFAR10 and evaluating performance, but it does not specify any training/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper mentions software like 'ADAM' and 'WGAN-GP models', but it does not specify any version numbers for these or other software dependencies (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes We set z = 0 w.l.o.g. as it does not affect the convergence behavior. The matrices A, B, C R4 4 are randomly generated and processed to satisfy Assumption 5.1. We choose L = 100, µ = 1 in the beginning and compute µx after the matrices are sampled. We keep ηy = 1 2L and change the stepsize ratio by varying ηx. The WGAN-GP model in (Gulrajani et al., 2017) was trained using ADAM, a variant of GDA, with ηx = ηy = 0.0001. Our experiment uses the same WGAN-GP model. However, different from their algorithm, we use simultaneous GDA with the same number of gradient steps for both variables. As we can see, simultaneous GDA with ηx = ηy = 0.001 is able to converge with a high speed for both MNIST and CIFAR10.