Train simultaneously, generalize better: Stability of gradient-based minimax learners
Authors: Farzan Farnia, Asuman Ozdaglar
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we discuss the results of our numerical experiments and compare the generalization performance of GDA and PPM algorithms in convex concave settings and singlestep and multi-step gradient-based methods in non-convex non-concave GAN problems. Our numerical results also suggest that in general non-convex non-concave problems the models learned by simultaneous optimization algorithms can generalize better than the models learned by non-simultaneous optimization methods. |
| Researcher Affiliation | Academia | Farzan Farnia 1 Asuman Ozdaglar 1 1Laboratory for Information & Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. |
| Pseudocode | No | The paper describes update rules for GDA, GDmax, and PPM using mathematical equations (1), (2), and (3) in Section 3, but it does not include pseudocode blocks or clearly labeled algorithm sections. |
| Open Source Code | No | The paper does not provide any concrete access (e.g., repository link, explicit statement of code release) to the source code for the methodology described. |
| Open Datasets | Yes | We trained the spectrally-normalized GAN (SN-GAN) problem over CIFAR-10 (Krizhevsky et al., 2009) and Celeb A (Liu et al., 2018) datasets. |
| Dataset Splits | No | The paper states: "We divided the CIFAR-10 and Celeb A datasets to 50,000, 160,000 training and 10,000, 40,000 test samples, respectively." It provides details for training and test splits, but no explicit mention of a validation set or its size/proportion. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using the "standard Adam algorithm (Kingma & Ba, 2014)" but does not specify any software names with version numbers (e.g., Python, TensorFlow, PyTorch versions) or other library dependencies. |
| Experiment Setup | Yes | To optimize the empirical minimax risk, we applied stochastic GDA with stepsize parameters αw = αθ = 0.02 and stochastic PPM with parameter η = 0.02 each for T = 20, 000 iterations. [...] We used the standard Adam algorithm (Kingma & Ba, 2014) with batch-size 100. For simultaneous optimization algorithms we applied 1,1 Adam descent ascent with the parameters lr = 10 4, β1 = 0.5, β2 = 0.9 for both minimization and maximization updates. To apply a non-simultaneous algorithm, we used 100 Adam maximization steps per minimization step and increased the maximization learning rate to 5 10 4. We ran each GAN experiment for T =100,000 iterations. |