On Convergence of Gradient Descent Ascent: A Tight Local Analysis
Authors: Haochuan Li, Farzan Farnia, Subhro Das, Ali Jadbabaie
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct several numerical experiments to support our theoretical findings. We conduct experiments on quadratic NC-SC functions to illustrate the convergence behaviors under different stepsize ratios. Finally, we conduct experiments on GANs to show that simultaneous GDA with ηx ηy enjoys fast convergence to a desired solution. |
| Researcher Affiliation | Collaboration | 1Department of EECS, Massachusetts Institute of Technology 2Department of CSE, The Chinese University of Hong Kong 3MITIBM Watson AI Lab, IBM Research 4Department of CEE, Massachusetts Institute of Technology. |
| Pseudocode | No | The paper describes algorithms using mathematical equations (e.g., equations 4 and 5) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not contain any statement about releasing source code or a direct link to a code repository. |
| Open Datasets | Yes | For both MNIST and CIFAR10, we train WGAN-GP models (Gulrajani et al., 2017) using simultaneous GDA with ηx = ηy = 0.001. |
| Dataset Splits | No | The paper mentions training on MNIST and CIFAR10 and evaluating performance, but it does not specify any training/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'ADAM' and 'WGAN-GP models', but it does not specify any version numbers for these or other software dependencies (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | We set z = 0 w.l.o.g. as it does not affect the convergence behavior. The matrices A, B, C R4 4 are randomly generated and processed to satisfy Assumption 5.1. We choose L = 100, µ = 1 in the beginning and compute µx after the matrices are sampled. We keep ηy = 1 2L and change the stepsize ratio by varying ηx. The WGAN-GP model in (Gulrajani et al., 2017) was trained using ADAM, a variant of GDA, with ηx = ηy = 0.0001. Our experiment uses the same WGAN-GP model. However, different from their algorithm, we use simultaneous GDA with the same number of gradient steps for both variables. As we can see, simultaneous GDA with ηx = ηy = 0.001 is able to converge with a high speed for both MNIST and CIFAR10. |