GDA-AM: ON THE EFFECTIVENESS OF SOLVING MIN-IMAX OPTIMIZATION VIA ANDERSON MIXING

Authors: Huan He, Shifan Zhao, Yuanzhe Xi, Joyce Ho, Yousef Saad

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement our theoretical results with numerical simulations across a variety of minimax problems. We show that for some convex-concave and non-convex-concave functions, GDA-AM can converge to the optimal point with little hyper-parameter tuning whereas existing first-order methods are prone to divergence and cycling behaviors. We also provide empirical results for GAN training across two different datasets, CIFAR10 and Celeb A.
Researcher Affiliation Academia Huan He, Shifan Zhao, Yuanzhe Xi, Joyce C Ho Department of Computer Science Emory University Atlanta, GA 30329, USA Yousef Saad Department of Computer Science and Engineering University of Minnesota Minneapolis, MN 55455, USA
Pseudocode Yes Algorithm 1: Anderson Mixing Prototype (truncated version) ... Algorithm 2: Simultaneous GDA-AM ... Algorithm 3: Alternating GDA-AM ... Algorithm 5: QR-updating procedures
Open Source Code Yes Codes are available on Github 1. 1https://github.com/hehuannb/GDA-AM
Open Datasets Yes We apply our method to the CIFAR10 dataset (Krizhevsky, 2009) ... We also compared the performance of GDA-AM using cropped Celeb A (64 64) (Liu et al., 2015)
Dataset Splits No The paper states 'Experiments were run with 5 random seeds' and 'Models are evaluated using the inception score (IS) (Salimans et al., 2016) and FID (Heusel et al., 2017) computed on 50,000 samples.' for evaluation, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or counts) or refer to standard splits with citations for reproducibility.
Hardware Specification Yes Experiments were run one NVIDIA V100 GPU.
Software Dependencies No The paper states 'For our experiments, we used the Py Torch 3 deep learning framework.' It mentions PyTorch but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes We use a learning rate of 2 10 4 and batch size of 64. For table size of GDA-AM , we set it as 120 for CIFAR10 and 150 for Celeb A. We set β1 = 0.0 and β2 = 0.9 as we find it gives us better models than default settings.