On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems
Authors: Tianyi Lin, Chi Jin, Michael Jordan
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present several empirical results to show that two-time-scale GDA outperforms GDmax. The task is to train the empirical Wasserstein robustness model (WRM) (Sinha et al., 2018) over a collection of data samples {ξi}N i=1 with ℓ2-norm attack and a penalty parameter γ > 0. |
| Researcher Affiliation | Academia | 1Department of Industrial Engineering and Operations Research, UC Berkeley 2Department of Electrical Engineering, Princeton University 3Department of Statistics and Electrical Engineering and Computer Science, UC Berkeley. |
| Pseudocode | Yes | Algorithm 1 Two-Time-Scale GDA |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the methodology described in this paper is publicly available. |
| Open Datasets | Yes | We mainly follow the setting of Sinha et al. (2018) and consider training a neural network classifier on three datasets1: MNIST, Fashion-MNIST, and CIFAR-10, with the default cross validation. 1https://keras.io/datasets/ |
| Dataset Splits | No | While the paper mentions "with the default cross validation", it does not provide specific details on the dataset splits (e.g., exact percentages or sample counts for training, validation, and test sets). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or their version numbers, such as programming language versions, deep learning framework versions, or library versions used for the experiments. |
| Experiment Setup | Yes | Small and large adversarial perturbation is set with γ {0.4, 1.3} as the same as Sinha et al. (2018). The baseline approach is denoted as GDm A in which ηx = ηy = 10 3 and each inner loop contains 20 gradient ascent. Two-time-scale GDA is denoted as GDA in which ηx = 5 10 5 and ηy = 10 3. |