GDA-AM: ON THE EFFECTIVENESS OF SOLVING MIN-IMAX OPTIMIZATION VIA ANDERSON MIXING
Authors: Huan He, Shifan Zhao, Yuanzhe Xi, Joyce Ho, Yousef Saad
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement our theoretical results with numerical simulations across a variety of minimax problems. We show that for some convex-concave and non-convex-concave functions, GDA-AM can converge to the optimal point with little hyper-parameter tuning whereas existing first-order methods are prone to divergence and cycling behaviors. We also provide empirical results for GAN training across two different datasets, CIFAR10 and Celeb A. |
| Researcher Affiliation | Academia | Huan He, Shifan Zhao, Yuanzhe Xi, Joyce C Ho Department of Computer Science Emory University Atlanta, GA 30329, USA Yousef Saad Department of Computer Science and Engineering University of Minnesota Minneapolis, MN 55455, USA |
| Pseudocode | Yes | Algorithm 1: Anderson Mixing Prototype (truncated version) ... Algorithm 2: Simultaneous GDA-AM ... Algorithm 3: Alternating GDA-AM ... Algorithm 5: QR-updating procedures |
| Open Source Code | Yes | Codes are available on Github 1. 1https://github.com/hehuannb/GDA-AM |
| Open Datasets | Yes | We apply our method to the CIFAR10 dataset (Krizhevsky, 2009) ... We also compared the performance of GDA-AM using cropped Celeb A (64 64) (Liu et al., 2015) |
| Dataset Splits | No | The paper states 'Experiments were run with 5 random seeds' and 'Models are evaluated using the inception score (IS) (Salimans et al., 2016) and FID (Heusel et al., 2017) computed on 50,000 samples.' for evaluation, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or counts) or refer to standard splits with citations for reproducibility. |
| Hardware Specification | Yes | Experiments were run one NVIDIA V100 GPU. |
| Software Dependencies | No | The paper states 'For our experiments, we used the Py Torch 3 deep learning framework.' It mentions PyTorch but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We use a learning rate of 2 10 4 and batch size of 64. For table size of GDA-AM , we set it as 120 for CIFAR10 and 150 for Celeb A. We set β1 = 0.0 and β2 = 0.9 as we find it gives us better models than default settings. |