reproducibilityindex.ai

Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

Authors: Mingrui Liu, Youssef Mroueh, Jerret Ross, Wei Zhang, Xiaodong Cui, Payel Das, Tianbao Yang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our experiments show that indeed adaptive gradient algorithms outperform their non-adaptive counterparts in GAN training. Moreover, this observation can be explained by the slow growth rate of the cumulative stochastic gradient, as observed empirically.
Researcher Affiliation	Collaboration	Mingrui Liu1 , Youssef Mroueh2, Jerret Ross2, Wei Zhang2, Xiaodong Cui2, Payel Das2, Tianbao Yang1 1 Department of Computer Science, The University of Iowa, Iowa City, IA, 52242, USA 2 IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Pseudocode	Yes	Algorithm 1 Optimistic Stochastic Gradient (OSG) ... Algorithm 2 Optimistic Ada Grad (OAdagrad)
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We use Wasserstein GAN with gradient penalty (WGAN-GP) (Gulrajani et al., 2017) and CIFAR10 data in our experiments. ... We use the model from Self-Attention GAN (Zhang et al., 2018) (SA-GAN) and Image Net as our dataset.
Dataset Splits	Yes	We use Wasserstein GAN with gradient penalty (WGAN-GP) (Gulrajani et al., 2017) and CIFAR10 data in our experiments. ... We use the model from Self-Attention GAN (Zhang et al., 2018) (SA-GAN) and Image Net as our dataset.
Hardware Specification	No	The paper mentions "due to limited computational resources" but provides no specific details about the hardware (e.g., GPU models, CPU types) used for the experiments.
Software Dependencies	No	The paper mentions using the "Py Torch framework (Paszke et al., 2017)", but it does not specify a version number for PyTorch or any other software dependencies with their versions.
Experiment Setup	Yes	We try different batch sizes (64, 128, 256) for each algorithm. For each algorithm, we tune the learning rate in the range of {1 10 3, 2 10 4, 1 10 4, 2 10 5, 1 10 5} when using batch size 64, and use the same learning rate for batch size 128 and 256. ... Training is performed with batch size 128 for all experiments. ... Speciﬁcally, the learning rates used are 10 3 for the generator and 4 10 5 for the discriminator.