Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

Authors: Mingrui Liu, Youssef Mroueh, Jerret Ross, Wei Zhang, Xiaodong Cui, Payel Das, Tianbao Yang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our experiments show that indeed adaptive gradient algorithms outperform their non-adaptive counterparts in GAN training. Moreover, this observation can be explained by the slow growth rate of the cumulative stochastic gradient, as observed empirically.
Researcher Affiliation Collaboration Mingrui Liu1 , Youssef Mroueh2, Jerret Ross2, Wei Zhang2, Xiaodong Cui2, Payel Das2, Tianbao Yang1 1 Department of Computer Science, The University of Iowa, Iowa City, IA, 52242, USA 2 IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Pseudocode Yes Algorithm 1 Optimistic Stochastic Gradient (OSG) ... Algorithm 2 Optimistic Ada Grad (OAdagrad)
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We use Wasserstein GAN with gradient penalty (WGAN-GP) (Gulrajani et al., 2017) and CIFAR10 data in our experiments. ... We use the model from Self-Attention GAN (Zhang et al., 2018) (SA-GAN) and Image Net as our dataset.
Dataset Splits Yes We use Wasserstein GAN with gradient penalty (WGAN-GP) (Gulrajani et al., 2017) and CIFAR10 data in our experiments. ... We use the model from Self-Attention GAN (Zhang et al., 2018) (SA-GAN) and Image Net as our dataset.
Hardware Specification No The paper mentions "due to limited computational resources" but provides no specific details about the hardware (e.g., GPU models, CPU types) used for the experiments.
Software Dependencies No The paper mentions using the "Py Torch framework (Paszke et al., 2017)", but it does not specify a version number for PyTorch or any other software dependencies with their versions.
Experiment Setup Yes We try different batch sizes (64, 128, 256) for each algorithm. For each algorithm, we tune the learning rate in the range of {1 10 3, 2 10 4, 1 10 4, 2 10 5, 1 10 5} when using batch size 64, and use the same learning rate for batch size 128 and 256. ... Training is performed with batch size 128 for all experiments. ... Specifically, the learning rates used are 10 3 for the generator and 4 10 5 for the discriminator.